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Foreword 


The present book is meant as a text for a course in linear algebra, at the 
undergraduate level in the upper division. 

My Introduction to Linear Algebra provides a text for beginning stu¬ 
dents, at the same level as introductory calculus courses. The present 
book is meant to serve at the next level, essentially for a second course 
in linear algebra, where the emphasis is on the various structure 
theorems: eigenvalues and eigenvectors (which at best could occur only 
rapidly at the end of the introductory course); symmetric, hermitian and 
unitary operators, as well as their spectral theorem (diagonalization); 
triangulation of matrices and linear maps; Jordan canonical form; convex 
sets and the Krein-Milman theorem. One chapter also provides a com¬ 
plete theory of the basic properties of determinants. Only a partial treat¬ 
ment could be given in the introductory text. Of course, some parts of 
this chapter can still be omitted in a given course. 

The chapter of convex sets is included because it contains basic results 
of linear algebra used in many applications and “geometric” linear 
algebra. Because logically it uses results from elementary analysis (like a 
continuous function on a closed bounded set has a maximum) I put it at 
the end. If such results are known to a class, the chapter can be covered 
much earlier, for instance after knowing the definition of a linear map. 

I hope that the present book can be used for a one-term course. The 
first six chapters review some of the basic notions. I looked for effi¬ 
ciency. Thus the theorem that m homogeneous linear equations in n 
unknowns has a non-trivial soluton if n > m is deduced from the dimen¬ 
sion theorem rather than the other way around as in the introductory 
text. And the proof that two bases have the same number of elements 
(i.e. that dimension is defined) is done rapidly by the “interchange” 
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method. I have also omitted a discussion of elementary matrices, and 
Gauss elimination, which are thoroughly covered in my Introduction to 
Linear Algebra. Hence the first part of the present book is not a substi¬ 
tute for the introductory text. It is only meant to make the present book 
self contained, with a relatively quick treatment of the more basic mate¬ 
rial, and with the emphasis on the more advanced chapters. Today’s 
curriculum is set up in such a way that most students, if not all, will 
have taken an introductory one-term course whose emphasis is on 
matrix manipulation. Hence a second course must be directed toward 
the structure theorems. 

Appendix 1 gives the definition and basic properties of the complex 
numbers. This includes the algebraic closure. The proof of course must 
take for granted some elementary facts of analysis, but no theory of 
complex variables is used. 

Appendix 2 treats the Iwasawa decomposition, in a topic where the 
group theoretic aspects begin to intermingle seriously with the purely linear 
algebra aspects. This appendix could (should?) also be treated in the 
general undergraduate algebra course. 

Although from the start I take vector spaces over fields which are 
subfields of the complex numbers, this is done for convenience, and to 
avoid drawn out foundations. Instructors can emphasize as they wish 
that only the basic properties of addition, multiplication, and division are 
used throughout, with the important exception, of course, of those theor¬ 
ies which depend on a positive definite scalar product. In such cases, the 
real and complex numbers play an essential role. 

New Haven , Serge Lang 

Connecticut 


Acknowledgments 

I thank Ron Infante and Peter Pappas for assisting with the proof reading 
and for useful suggestions and corrections. I also thank Gimli Khazad for 
his corrections. 


S.L. 



Contents 


CHAPTER I 

Vector Spaces. 1 

§1. Definitions. 2 

§2. Bases. 10 

§3. Dimension of a Vector Space. 15 

§4. Sums and Direct Sums. 19 

CHAPTER II 

Matrices. 23 

§1. The Space of Matrices. 23 

§2. Linear Equations. 29 

§3. Multiplication of Matrices. 31 

CHAPTER III 

Linear Mappings. 43 

§1. Mappings. 43 

§2. Linear Mappings. 51 

§3. The Kernel and Image of a Linear Map. 59 

§4. Composition and Inverse of Linear Mappings. 66 

§5. Geometric Applications. 72 


CHAPTER IV 

Linear Maps and Matrices. 81 

§1. The Linear Map Associated with a Matrix. 81 

§2. The Matrix Associated with a Linear Map. 82 

§3. Bases, Matrices, and Linear Maps. 87 






















Vlll 


CONTENTS 


CHAPTER V 

Scalar Products and Orthogonality . 95 

§1. Scalar Products. 95 

§2. Orthogonal Bases, Positive Definite Case. 103 

§3. Application to Linear Equations; the Rank. 113 

§4. Bilinear Maps and Matrices. 118 

§5. General Orthogonal Bases. 123 

§6. The Dual Space and Scalar Products. 125 

§7. Quadratic Forms. 132 

§8. Sylvester’s Theorem. 135 

CHAPTER VI 

Determinants . 140 

§1. Determinants of Order 2. 140 

§2. Existence of Determinants. 143 

§3. Additional Properties of Determinants. 150 

§4. Cramer’s Rule. 157 

§5. Triangulation of a Matrix by Column Operations. 161 

§6. Permutations. 163 

§7. Expansion Formula and Uniqueness of Determinants. 168 

§8. Inverse of a Matrix. 174 

§9. The Rank of a Matrix and Subdeterminants. 177 

CHAPTER VII 

Symmetric, Hermitian, and Unitary Operators . 180 

§1. Symmetric Operators. 180 

§2. Hermitian Operators. 184 

§3. Unitary Operators. 188 

CHAPTER Vlll 

Eigenvectors and Eigenvalues . 194 

§1. Eigenvectors and Eigenvalues. 194 

§2. The Characteristic Polynomial. 200 

§3. Eigenvalues and Eigenvectors of Symmetric Matrices . 213 

§4. Diagonalization of a Symmetric Linear Map. 218 

§5. The Hermitian Case. 225 

§6. Unitary Operators. 227 

CHAPTER IX 

Polynomials and Matrices . 231 

§1. Polynomials. 231 

§2. Polynomials of Matrices and Linear Maps. 233 






































CONTENTS 


IX 


CHAPTER X 

Triangulation of Matrices and Linear Maps . 237 

§1. Existence of Triangulation. 237 

§2. Theorem of Hamilton-Cayley. 240 

§3. Diagonalization of Unitary Maps. 242 

CHAPTER XI 

Polynomials and Primary Decomposition . 245 

§1. The Euclidean Algorithm. 245 

§2. Greatest Common Divisor. 248 

§3. Unique Factorization. 251 

§4. Application to the Decomposition of a Vector Space. 255 

§5. Schur’s Lemma. 260 

§6. The Jordan Normal Form. 262 

CHAPTER XII 

Convex Sets . 268 

§1. Definitions . 268 

§2. Separating Hyperplanes. 270 

§3. Extreme Points and Supporting Hyperplanes . 272 

§4. The Krein-Milman Theorem. 274 

APPENDIX I 

Complex Numbers . 277 

APPENDIX II 

Iwasawa Decomposition and Others . 283 


Index 


293 
























CHAPTER I 


Vector Spaces 


As usual, a collection of objects will be called a set. A member of the 
collection is also called an element of the set. It is useful in practice to 
use short symbols to denote certain sets. For instance, we denote by R 
the set of all real numbers, and by C the set of all complex numbers. To 
say that “x is a real number” or that “x is an element of R” amounts to 
the same thing. The set of all rc-tuples of real numbers will be denoted 
by R". Thus “ X is an element of R”” and “X is an n-tuple of real 
numbers” mean the same thing. A review of the definition of C and its 
properties is given an Appendix. 

Instead of saying that u is an element of a set S, we shall also fre¬ 
quently say that u lies in S and write ueS. If S and S' are sets, and if 
every element of S' is an element of S, then we say that S' is a subset of 
S. Thus the set of real numbers is a subset of the set of complex 
numbers. To say that S' is a subset of S is to say that S' is part of S. 
Observe that our definition of a subset does not exclude the possibility 
that S' = S. If S' is a subset of 5, but S' # S, then we shall say that S' is 
a proper subset of S. Thus C is a subset of C, but R is a proper subset 
of C. To denote the fact that S' is a subset of 5, we write S' a S 9 and 
also say that S' is contained in S. 

If S l9 S 2 are sets, then the intersection of S 1 and S 2 , denoted by 
nS 2 , is the set of elements which lie in both S t and S 2 . The union of 
and S 2 , denoted by S l u S 2 , is the set of elements which lie in S x or 
in S 2 • 
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I, §1. DEFINITIONS 

Let K be a subset of the complex numbers C. We shall say that K is a 
field if it satisfies the following conditions: 

(a) If x, y are elements of K , then x 4- y and xy are also elements of 
K. 

(b) If x g K, then — x is also an element of K. If furthermore x ^ 0, 
then x _1 is an element of K. 

(c) The elements 0 and 1 are elements of K. 

We observe that both R and C are fields. 

Let us denote by Q the set of rational numbers, i.e. the set of all frac¬ 
tions m/n, where m, n are integers, and n ^ 0. Then it is easily verified 
that Q is a field. 

Let Z denote the set of all integers. Then Z is not a field, because 
condition (b) above is not satisfied. Indeed, if n is an integer ^ 0, then 
n _1 = 1/n is not an integer (except in the trivial case that n = 1 or 
n = —1). For instance \ is not an integer. 

The essential thing about a field is that it is a set of elements which 
can be added and multiplied, in such a way that additon and multiplica¬ 
tion satisfy the ordinary rules of arithmetic, and in such a way that one 
can divide by non-zero elements. It is possible to axiomatize the notion 
further, but we shall do so only later, to avoid abstract discussions which 
become obvious anyhow when the reader has acquired the necessary 
mathematical maturity. Taking into account this possible generalization, 
we should say that a field as we defined it above is a field of (complex) 
numbers. However, we shall call such fields simply fields. 

The reader may restrict attention to the fields of real and complex 
numbers for the entire linear algebra. Since, however, it is necessary to 
deal with each one of these fields, we are forced to choose a neutral 
letter K. 

Let K , L be fields, and suppose that K is contained in L (i.e. that K 
is a subset of L). Then we shall say that K is a subfield of L. Thus 
every one of the fields which we are considering is a subfield of the com¬ 
plex numbers. In particular, we can say that R is a subfield of C, and Q 
is a subfield of R. 

Let K be a field. Elements of K will also be called numbers (without 
specification) if the reference to K is made clear by the context, or they 
will be called scalars. 

A vector space V over the field K is a set of objects which can be 
added and multiplied by elements of K , in such a way that the sum of 
two elements of V is again an element of V, the product of an element of 
V by an element of K is an element of V, and the following properties 
are satisfied: 
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VS 1. Given elements u, v, w of V, we have 

(u + v) + w = u + (v + w). 

VS 2. There is an element of V , denoted by 0, such that 

0 J rU = U J rO = U 

for all elements u of V. 

VS 3. Given an element u of V, there exists an element —u in V such 
that 

u + ( — u) = O. 

VS 4. For all elements u, v of K we have 

u 4- v = v + u. 

VS 5. If c is a number , then c(u + v) = cu + cv. 

VS 6. If a , b are two numbers , r/zew (a + 6)u = ai> 4- bv. 

VS 7. // a, b are two numbers , then (ab)v = a(bv). 

VS8. For all elements u of V, we have \-u = u (1 Ziere is t/ie number 
one). 

We have used all these rules when dealing with vectors, or with func¬ 
tions but we wish to be more systematic from now on, and hence have 
made a list of them. Further properties which can be easily deduced 
from these are given in the exercises and will be assumed from now on. 

Example 1. Let V = K n be the set of n-tuples of elements of K. Let 

A = (a l ,...,a n ) and B = (b l ,...,b n ) 

be elements of K n . We call a i ,...,a n the components, or coordinates, of A. 
We define 

A B = (a 1 + b x ,... ,a„ 4- b n ). 

If cgK we define 

cA = (ca u ... ,ca„). 
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Then it is easily verified that all the properties VS 1 through VS 8 are 
satisfied. The zero elements is the n-tuple 

0 = ( 0 ,..., 0 ) 

with all its coordinates equal to 0. 

Thus C" is a vector space over C, and Q” is a vector space over Q. 
We remark that R” is not a vector space over C. Thus when dealing 
with vector spaces, we shall always specify the field over which we take 
the vector space. When we write K n , it will always be understood that it 
is meant as a vector space over K. Elements of K n will also be called 
vectors and it is also customary to call elements of an arbitrary vector 
space vectors. 

If u , v are vectors (i.e. elements of the arbitrary vector space V ), then 

u + (-v) 


is usually written u — v. 

We shall use 0 to denote the number zero, and O to denote the ele¬ 
ment of any vector space V satisfying property VS 2. We also call it 
zero, but there is never any possibility of confusion. We observe that 
this zero element O is uniquely determined by condition VS 2 (cf. Exer¬ 
cise 5). 

Observe that for any element v in V we have 


Ov = O. 


The proof is easy, namely 

Ov + v = Ov + It; = (0 + l)v = \v = v. 

Adding — v to both sides shows that Ov = O. 

Other easy properties of a similar type will be used constantly and are 
given as exercises. For instance, prove that (— l)v = —v. 

It is possible to add several elements of a vector space. Suppose we 
wish to add four elements, say w, v , w, z. We first add any two of them, 
then a third, and finally a fourth. Using the rules VS 1 and VS 4, we see 
that it does not matter in which order we perform the additions. This is 
exactly the same situation as we had with vectors. For example, we have 

((w + v) 4- w) 4- z = (u 4- (v H- w)) -1- z 
= l(v 4- w) 4- u) + z 
= (d + w) + (m + z), etc. 
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Thus it is customary to leave out the parentheses, and write simply 

u + v + w + z. 

The same remark applies to the sum of any number n of elements of V, 
and a formal proof could be given by induction. 

Let V be a vector space, and let W be a subset of V We define W to 
be a subspace if W satisfies the following conditions: 

(i) If v , w are elements of W 9 their sum v + w is also an element of 
W. 

(ii) If v is an element of W and c a number, then cv is an element of 
IT 

(iii) The element 0 of V is also an element of IT 

Then W itself is a vector space. Indeed, properties VS 1 through VS 8, 
being satisfied for all elements of V, are satisfied a fortiori for the ele¬ 
ments of W. 

Example 2. Let V = K n and let W be the set of vectors in V whose last 
coordinate is equal to 0. Then IT is a subspace of V, which we could 
identify with K n ~\ 

Linear Combinations. Let V be an arbitrary vector space, and let 
v l9 ...,v n be elements of V Let x l9 ...,x n be numbers. An expression of 
type 

XiVx + ••• + x n v n 

is called a linear combination of v l9 ... 9 v n . 

Let W be the set of all linear combinations of v l9 ... 9 v n . Then W is a 
subspace of V. 

Proof Let y l9 ... ,y„ be numbers. Then 


(X^i + • • • + x„v„) + CVjt)! + • • • + y„v„) = (x t +y t ) v t +■■■ + ( x„ + y„)v„ 


Thus the sum of two elements of W is again an element of W 9 i.e. a 
linear combination of v l9 ... 9 v n . Furthermore, if c is a number, then 


c(x 1 v 1 + ■ ■ ■ + x„v„) = CX 1 V 1 + ■■■ + cx n v n 

is a linear combination of _and hence is an element of W. 

Finally, 

O = Oi^ + • • • + 0v n 

is an element of W. This proves that IT is a subspace of V 
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The subspace W as above is called the subspace generated by 
v l9 ... 9 v„. If W= V, i.e. if every element of V is a linear combination of 
v l9 ... 9 v n9 then we say that v l9 ... 9 v n generate V. 

Example 3. Let V=K n . Let A and BeK n , A = (a l9 . .. ,a„) and 
B = (b u ...,b n ). We define the dot product or scalar product 

A • B = a 1 b 1 + ••• + a n b n . 

It is then easy to verify the following properties. 

SP 1. We have A • B = B • A. 

SP 2. If A, B, C are three vectors , then 

A(B + C) = AB + AC = (B + C)A. 


SP 3. If xe K then 

( xA ) • B = x(A • B) and A • ( xB ) = x(A • B). 

We shall now prove these properties. 

Concerning the first, we have 


a 1 b 1 + • • • + a n b n — b x a x + • • • + b n a n9 


because for any two numbers a, b , we have ab = ba. This proves the 
first property. 

For SP 2, let C = (c l9 ... ,c„). Then 

B+ C = (b 1 +c l9 ... 9 b n + c n ) 


and 


A • (B + C) — a 1 (b 1 + Cj) + ... + a n (b n + c n ) 

= a i b i + a i c i + • • • + a n b n + a n c n . 


Reordering the terms yields 


a^b x + ••• + a n b n + a 1 c 1 + ••• + a n c n9 


which is none other than A • B + A • C. This proves what we wanted. 

We leave property SP 3 as an exercise. 

Instead of writing A A for the scalar product of a vector with itself, it 
will be convenient to write also A 2 . (This is the only instance when we 
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allow ourselves such a notation. Thus A 3 has no meaning.) As an exer¬ 
cise, verify the following identities: 

(A + B) 2 = A 2 + 2A B + B 2 , 

(A - B) 2 = A 2 - 2A B + B 2 . 


A dot product A • B may very well be equal to 0 without either A or 
B being the zero vector. For instance, let A = (1, 2, 3) and B = (2, 1, — §). 
Then A • B = 0. 


We define two vectors A, B to be perpendicular (or as we shall also 
say, orthogonal) if A • B = 0. Let A be a vector in K n . Let W be the set 
of all elements B in K n such that B • A = 0, i.e. such that B is perpen¬ 
dicular to A. Then W is a subspace of K n . To see this, note that 
O • A = 0, so that O is in W. Next, suppose that B , C are perpendicular to 
A. Then 


( B 4- C) • >4 — B ' A -|- C ’ A — 0, 


so that B + C is also perpendicular to A. Finally, if x is a number, then 


(xB) • A = x(B • A) = 0, 

so that xB is perpendicular to A. This proves that W is a subspace of 
K n . 

Example 4. Function Spaces. Let S be a set and K a field. By a func¬ 
tion of S into K we shall mean an association which to each element of 
S associates a unique element of K. Thus if / is a function of S into K , 
we express this by the symbols 


f:S^K. 

We also say that / is a K-valued function. Let V be the set of all func¬ 
tions of S into K. If /, g are two such functions, then we can form their 
sum / + g. It is the function whose value at an element x of S is 
/(x) + g(x). We write 


(/+ ff)( x ) =f(x) + g(x). 

If cgK, then we define cf to be the function such that 

(cf)(x) = cf (x). 

Thus the value of cf at x is c/(x). It is then a very easy matter to verify 
that V is a vector space over K. We shall leave this to the reader. We 



8 


VECTOR SPACES 


[I, §1] 


observe merely that the zero element of V is the zero function, i.e. the 
function / such that f(x) = 0 for all xeS. We shall denote this zero 
function by 0. 

Let V be the set of all functions of R into R. Then V is a vector 
space over R. Let W be the subset of continuous functions. If f g are 
continuous functions, then / + g is continuous. If c is a real number, 
then cf is continuous. The zero function is continuous. Hence IT is a 
subspace of the vector space of all functions of R into R, i.e. IT is a sub¬ 
space of V 

Let U be the set of differentiable functions of R into R. If /, g are 
differentiable functions, then their sum / + g is also differentiable. If c is 
a real number, then cf is differentiable. The zero function is differenti¬ 
able. Hence U is a subspace of V In fact, U is a subspace of W, because 
every differentiable function is continuous. 

Let V again be the vector space (over R) of functions from R into R. 
Consider the two functions e f y e 2t . (Strictly speaking, we should say the 
two functions f g such that / (t) = e l and gif) = e 2t for all t e R.) These 
functions generate a subspace of the space of all differentiable functions. 
The function 3e * + 2e 2t is an element of this subspace. So is the function 
2e l + ne 2t . 

Example 5. Let V be a vector space and let U , IT be subspaces. We 
denote by U n IT the intersection of U and W, i.e. the set of elements 
which lie both in U and IT Then U n IT is a subspace. For instance, if 
U , IT are two planes in 3-space passing through the origin, then in gen¬ 
eral, their intersection will be a straight line passing through the origin, 
as shown in Fig. 1. 


lj| ||| I 
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Figure 1 
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Example 6. Let U, W be subspaces of a vector space V By 

U + W 

we denote the set of all elements u + w with ugU and xveW. Then we 
leave it to the reader to verify that U + W is a subspace of V, said to be 
generated by U and W 9 and called the sum of U and W. 


I, §1. EXERCISES 

1. Let V be a vector space. Using the properties VS 1 through VS 8, show that 
if c is a number, then cO = 0. 

2. Let c be a number # 0, and v an element of V Prove that if cv = 0, then 
v = 0. 

3. In the vector space of functions, what is the function satisfying the condition 
VS 2? 

4. Let V be a vector space and v 9 w two elements of V If v -I- w = O, show that 
w = — v. 

5. Let V be a vector space, and v , w two elements of V such that v + w = v. 
Show that w = O. 

6. Let A l9 A 2 be vectors in R". Show that the set of all vectors B in R" such 
that B is perpendicular to both A 1 and A 2 is a subspace. 

7. Generalize Exercise 6, and prove: Let A l9 ...,A r be vectors in R". Let W be 
the set of vectors B in R" such that B A t = 0 for every i = 1 ,...,r. Show that 
W is a subspace of R”. 

8. Show that the following sets of elements in R 2 form subspaces. 

(a) The set of all (x, y) such that x = y. 

(b) The set of all (x, y) such that x — y = 0. 

(c) The set of all (x, y) such that x + 4y = 0. 

9. Show that the following sets of elements in R 3 form subspaces. 

(a) The set of all (x, y, z) such that x + y + z = 0. 

(b) The set of all (x, y, z) such that x = y and 2y = z. 

(c) The set of all (x, y, z) such that x + y = 3z. 

10. If U, W are subspaces of a vector space V, show that U n W and U + W are 

subspaces. 

II. Let K be a subfield of a field L. Show that L is a vector space over K. In 
particular, C and R are vector spaces over Q. 

12. Let K be the set of all numbers which can be written in the form a + by/2, 
where a , b are rational numbers. Show that K is a field. 

13. Let K be the set of all numbers which can be written in the form a + bi , 
where a , b are rational numbers. Show that K is a field. 
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14. Let c be a rational number > 0, and let y be a real number such that y 2 = c. 
Show that the set of all numbers which can be written in the form a + by, 
where a, b are rational numbers, is a field. 


I, §2. BASES 

Let V be a vector space over the field K , and let v l9 ...,v n be elements of 
V. We shall say that v l9 .,. 9 v n are linearly dependent over K if there exist 
elements a l9 ... 9 a n in K not all equal to 0 such that 

a i v i + ••• + a n v„ = O. 

If there do not exist such numbers, then we say that v l9 ... 9 v n are linearly 
independent. In other words, vectors v l9 ... 9 v n are linearly independent if 
and only if the following condition is satisfied: 

Whenever a l9 ... ,a n are numbers such that 

a x v i + ••• + a n v n = O , 


then a { = 0 for all i = 1,... ,n. 


Example 1. Let V = K n and consider the vectors 

Ei =( 1 , 0 ,... , 0 ) 

E n = ( 0, 0,... ,1). 

Then E l ,...,E n are linearly independent. Indeed, let a x ,...,a n be numbers 
such that 

a \E x + ••• + a n E n = O. 


Since 


+ ••• + a n E n = a n ). 


it follows that all a { = 0. 

Example 2. Let V be the vector space of all functions of a variable t . 
Let /i,...,/„ be n functions. To say that they are linearly dependent is 
to say that there exists n numbers a l9 ...,a n not all equal to 0 such that 


a if i(0 + • • * + a n f n (f) — 0 


for all values of t. 
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The two functions e\ e 2t are linearly independent. To prove this, sup¬ 
pose that there are numbers a, b such that 

ae l 4- be 2t = 0 

(for all values of £). Differentiate this relation. We obtain 


ae l + 2 be 2t = 0. 


Subtract the first from the second relation. We obtain be 2t = 0, and 
hence b = 0. From the first relation, it follows that ae l = 0, and hence 
a = 0. Hence e\ e 2t are linearly independent. 

If elements v l9 ... 9 v n of V generate V and in addition are linearly inde¬ 
pendent, then {v l9 .. 9 v n } is called a basis of V. We shall also say that the 
elements v u ... 9 v n constitute or form a basis of V. 

The vectors E l9 ... 9 E n of Example 1 form a basis of K n . 

Let W be the vector space of functions generated by the two functions 
e\ e 2t . Then {e\ e 2t } is a basis of W. 

We shall now define the coordinates of an element veV with respect 
to a basis. The definition depends on the following fact. 

Theorem 2.1. Let V be a vector space. Let v l9 ...,v n be linearly inde¬ 
pendent elements of V. Let x u ... 9 x n and be numbers. Suppose 

that we have 

x x v x + • • • + x n v n = y t v t + • • • + y n v n . 

Then x t = y t for i = /,... ,n. 

Proof Subtracting the right-hand side from the left-hand side, we get 

- yiVi + * * • + x n v n - y n v n = O. 

We can write this relation also in the form 


(*i - y i) v i + • • • 4- (x„ - y n )v n = O. 


By definition, we must have x { — y { = 0 for all i = l,...,n, thereby prov¬ 
ing our assertion. 

Let Fbe a vector space, and let {v l9 ... 9 v n } be a basis of V. The ele¬ 
ments of V can be represented by n-tuples relative to this basis, as fol¬ 
lows. If an element v of V is written as a linear combination 


v = XlVl + ••• + x n v n 
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then by the above remark, the n-tuple (x l9 ...,x n ) is uniquely determined 
by v. We call (x l9 ...,x w ) the coordinates of v with respect to our basis, 
and we call the z-th coordinate. The coordinates with respect to the 
usual basis E u ... E n of K n are the coordinates of the zz-tuple X. We say 
that the n-tuple X = (x l9 ...,x„) is the coordinate vector of v with respect 
to the basis {v l9 ... ,v n }. 

Example 3. Let V be the vector space of functions generated by the 
two functions e\ e 2t . Then the coordinates of the function 

3e' + 5e 2 ' 

with respect to the basis {e\ e 2t j are (3, 5). 

Example 4. Show that the vectors (1,1) and ( — 3,2) are linearly inde¬ 
pendent. 

Let a, b be two numbers such that 


a(l, l) + fc( —3, 2) = O. 


Writing this equation in terms of components, we find 

a — 3b = 0, a + 2b = 0. 

This is a system of two equations which we solve for a and b. Subtract¬ 
ing the second from the first, we get — 5b = 0, whence b = 0. Substitut¬ 
ing in either equation, we find a = 0. Hence a , b are both 0, and our 
vectors are linearly independent. 

Example 5. Find the coordinates of (1,0) with respect to the two vec¬ 
tors (1, 1) and (—1, 2), which form a basis. 

We must find numbers a, b such that 

a (l> 1) + fr(— 1, 2) = (1,0). 

Writing this equation in terms of coordinates, we find 

a — b = 1, a + 2b = 0. 

Solving for a and b in the usual manner yields b = — ^ and a = f. 
Hence the coordinates of (1,0) with respect to (1,1) and (—1,2) are 

(I, - *)• 

Example 6. Show that the vectors (1,1) and (—1,2) form a basis of 

R 2 . 
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We have to show that they are linearly independent and that they 
generate R 2 . To prove linear independence, suppose that a , b are 
numbers such that 

a( 1, l) + ft(-l,2) = (0, 0). 

Then 

a — b = 0, a 4- 2b = 0. 

Subtracting the first equation from the second yields 3 b = 0, so that 
b = 0. But then from the first equation, a = 0, thus proving that our 
vectors are linearly independent. Next, let (a, b) be an arbitrary element 
of R 2 . We have to show that there exist numbers x, y such that 

x(l,l) + y(-l,2) = (a, b). 

In other words, we must solve the system of equations 

x — y = a, 
x 4- 2 y — b. 

Again subtract the first equation from the second. We find 

3 y = b — a, 
b — a 


b — a 

x = y + a = —--b a . 

This proves what we wanted. According to our definitions, (x, y) are the 
coordinates of (a, b) with respect to the basis {(1, 1), (—1,2)}. 

Let {v l9 ... 9 v n } be a set of elements of a vector space V. Let r be a 
positive integer ^ n. We shall say that {v l9 ...,v r } is a maximal subset of 
linearly independent elements if v l9 ...,v r are linearly independent, and if 
in addition, given any v t with i > r , the elements v l9 ... 9 v r9 v t are linearly 
dependent. 

The next theorem gives us a useful criterion to determine when a set 
of elements of a vector space is a basis. 

Theorem 2.2. Let {v 1 ,...,v n } be a set of generators of a vector space V. 

Let {v 1 ,...,v r } be a maximal subset of linearly independent elements. 

Then {v l9 ... 9 v r } is a basis of V. 


whence 


and finally 
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Proof. We must prove that v l9 ...,v r generate V. We shall first prove 
that each v ( (for i > r) is a linear combination of v l9 ...,v r . By hypothe¬ 
sis, given v i9 there exist numbers x l5 ...,x r , y not all 0 such that 


Xi!>i + • • • + x r v r + yv { = O. 


Furthermore, y ^ 0, because otherwise, we would have a relation of lin¬ 
ear dependence for Vi,...,v r . Hence we can solve for v i9 namely 

Xi x r 

Vi = - »!+••• +- v r , 

-y -y 

thereby showing that v t is a linear combination of v x ,...,v r . 

Next, let v be any element of V. There exist numbers c 1 ,...,c n such 
that 

v = c x v i + ••• + c n v n . 

In this relation, we can replace each v t (i > r) by a linear combination of 
v x ,...,v r . If we do this, and then collect terms, we find that we have ex¬ 
pressed v as a linear combination of v x ,...,v r . This proves that v l9 ...,v r 
generate V, and hence form a basis of V 


I, §2. EXERCISES 


1. Show that the following vectors are linearly independent (over C or R) 


(a) (1,1,1) and (0, 1,-2) 

(c) (-1,1,0) and (0,1, 2) 

(e) (tt, 0) and (0,1) 

(g) (1, 1,0), (1,1,1), and (0, 1,-1) 


(b) (1,0) and (1,1) 

(d) (2,-1) and (1,0) 

(f) (1,2) and (1,3) 

(h) (0,1,1), (0,2,1), and (1,5, 3) 


2. Express the given vector X as a linear combination of the given vectors A , B , 
and find the coordinates of X with respect to A , B. 

(a) X = (1, 0), A = (1, 1), * = (0,1) 

(b) X = (2, 1), A = (1,-1), B = (1, 1) 

(c) X = (1, 1), A = (2, 1), * = (-1,0) 

(d) X = (4, 3), A = (2, 1), * = (-1,0) 


3. Find the coordinates of the vector X with respect to the vectors A , B , C. 

(a) X = (1, 0,0), A = (1, 1, 1), * = (-1,1, 0), C = (1, 0, -1) 

(b) X = (1, 1, 1), A = (0, 1, -1), * = (1, 1, 0), C = (1, 0, 2) 

(c) X = (0, 0, 1), A = (1, 1, 1), * = (-1, 1, 0), C = (1, 0, - 1) 


4. Let ( a , b) and (c, d) be two vectors in the plane. If ad — be = 0, show that 
they are linearly dependent. If ad — be ^ 0, show that they are linearly inde¬ 
pendent. 
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5. Consider the vector space of all functions of a variable t. Show that the fol¬ 
lowing pairs of functions are linearly independent. 

(a) 1 , t (b) t , t 2 (c) t , t 4 (d) e\ t (e) te\ e 2t (f) sin t , cos t (g) t , sin t 
(h) sin t , sin 2 1 (i) cos t , cos 3t 

6. Consider the vector space of functions defined for t > 0. Show that the fol¬ 
lowing pairs of functons are linearly independent. 

(a) t, 1 /t (b) e\ log t 

7. What are the coordinates of the function 3 sin t + 5 cos t =f(t) with respect 
to the basis {sin t , cos t}l 

8. Let D be the derivative d/dt. Let f(t) be as in Exercise 7. What are the 
coordinates of the function Df(t) with respect to the basis of Exercise 7? 

9. Let A l9 ...,A r be vectors in R" and assume that they are mutually perpen¬ 
dicular (i.e. any two of them are perpendicular), and that none of them is 
equal to O. Prove that they are linearly independent. 

10. Let v , w be elements of a vector space and assume that v # O. If v, w are 
linearly dependent, show that there is a number a such that w = av. 


I, §3. DIMENSION OF A VECTOR SPACE 

The main result of this section is that any two bases of a vector space 
have the same number of elements. To prove this, we first have an inter¬ 
mediate result. 

Theorem 3.1. Let V be a vector space over the field K. Let {v u ... 9 v m } 
be a basis of V over K. Let w l9 ... 9 w n be elements of V 9 and assume that 
n>m. Then w l9 ... 9 w n are linearly dependent. 

Proof Assume that w 1 ,...,vv w are linearly independent. Since 
{v 1 ,...,v m } is a basis, there exist elements a l9 ... 9 a m e K such that 


Wi = a 1 v 1 + ••• + a m v m . 


By assumption, we know that Wj / O, and hence some a t ^ 0. After re¬ 
numbering v l9 ... 9 v m if necessary, we may assume without loss of generali¬ 
ty that say a x / 0. We can then solve for v u and get 

a i V l = Wj - a 2 V 2 - a m V m’ 

Vi = arX - ai 1 a 2 v 2 - 

The subspace of V generated by w l9 v 2 ,...,v m contains v l9 and hence must 
be all of V since v l9 v 29 ... 9 v m generate V. The idea is now to continue 
our procedure stepwise, and to replace successively v 2 ,v 3 ,... by 
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w 2 ,w 3 ,... until all the elements v l9 ... 9 v m are exhausted, and w l9 ...,w m 
generate V. Let us now assume by induction that there is an integer r 
with 1 ^r<m such that, after a suitable renumbering of v l9 ... 9 v m9 the 
elements w 1? ...,w r , v r+l9 ... 9 v m generate V There exist elements 

b i? • • • >b r , c r+ 1 ,... ,c m 


in K such that 


w r + 1 = b 1 w 1 + ••• + b r w r + c r+1 v r+1 + ••• + c m v m . 


We cannot have Cj = 0 for j — r + l,...,m, for otherwise, we get a rela¬ 
tion of linear dependence between w 1? ... ,w r+1 , contradicting our assump¬ 
tion. After renumbering v r+l9 ... ,v m if necessary, we may assume without 
loss of generality that say c r+1 / 0. We then obtain 

c r+ itv + i = w r+1 - b 1 w 1 - b r w r - c r+2 v r+2 - c m v m . 

Dividing by c r+l9 we conclude that v r + 1 is in the subspace generated by 
w 1? ...,w r+1 , v r+2 ,-..,v m . By our induction assumption, it follows that 
w 1 ,...,w r+1 , v r+29 ... 9 v m generate V Thus by induction, we have proved 
that wj,...,w m generate K If n > m, then there exist elements 

d±9 • • • ^ ^ 


such that 


w„ = djWj + ••• + rf m w m , 


thereby proving that w l9 are linearly dependent. This proves our 
theorem. 

Theorem 3.2. Let V be a vector space and suppose that one basis has n 

elements , and another basis has m elements. Then m = n. 

Proof. We apply Theorem 3.1 to the two bases. Theorem 3.1 implies 
that both alternatives n> m and m > n are impossible, and hence m = n. 

Let V be a vector space having a basis consisting of n elements. We 
shall say that n is the dimension of V. If V consists of O alone, then V 
does not have a basis, and we shall say that V has dimension 0. 
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Example 1. The vector space R" has dimension n over R, the vector 
space C" has dimension n over C. More generally for any field K , the 
vector space K n has dimension n over K. Indeed, the n vectors 

(1,0,...,0), (0,1,...,OX (0,...,0,1) 

form a basis of K n over K. 


The dimension of a vector space V over K will be denoted by dim* V 9 
or simply dim V. 

A vector space which has a basis consisting of a finite number of ele¬ 
ments, or the zero vector space, is called finite dimensional. Other vector 
spaces are called infinite dimensional. It is possible to give a definition 
for an infinite basis. The reader may look it up in a more advanced text. 
In this book, whenever we speak of the dimension of a vector space in 
the sequel, it is assumed that this vector space is finite dimensional. 


Example 2. Let K be a field. Then K is a vector space over itself, 
and it is of dimension 1. In fact, the element 1 of K forms a basis of K 
over K , because any element xeK has a unique expresssion as x = x -1. 


Example 3. Let V be a vector space. A subspace of dimension 1 is 
called a line in V. A subspace of dimension 2 is called a plane in V. 


We shall now give criteria which allow us to tell when elements of a 
vector space constitute a basis. 

Let v l9 ...,v n be linearly independent elements of a vector space V. We 
shall say that they form a maximal set of linearly independent elements of 
V if given any element w of K the elements w, v l9 ...,v n are linearly de¬ 
pendent. 


Theorem 3.3. Let V be a vector space, and {v l9 ...,v n } a maximal set of 
linearly independent elements of V. Then {v 1 ,...,v n } is a basis of V. 


Proof We must show that v l9 ... 9 v n generates V 9 i.e. that every element 
of Lean be expressed as a linear combination of v l9 ...,v n . Let w be an 
element of V. The elements w, v l9 ...,v n of V must be linearly dependent 
by hypothesis, and hence there exist numbers x 0 , x lv ..,x„ not all 0 such 
that 


x 0 w + x x v x + • • • + x n v n = O. 
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We cannot have x 0 = 0, because if that were the case, we would obtain a 
relation of linear dependence among v l9 ... 9 v n . Therefore we can solve for 
w in terms of v l9 ... 9 v n9 namely 


x 0 x 0 

This proves that w is a linear combination of v l9 ...,v n , and hence that 
{v l9 ... 9 v n } is a basis. 

Theorem 3.4. Let V be a vector space of dimension n, and let v l9 ...,v n 
be linearly independent elements of V. Then v 1 ,...,v n constitute a basis 
of V 

Proof According to Theorem 3.1, {v l9 ... ,v n } is a maximal set of lin¬ 
early independent elements of V. Hence it is a basis by Theorem 3.3. 

Corollary 3.5. Let V be a vector space and let W be a subspace. If 
dim W = dim V then V = W. 

Proof A basis for W must also be a basis for V by Theorem 3.4. 

Corollary 3.6. Let V be a vector space of dimension n. Let r be a posi¬ 
tive integer with r < n, and let v l9 ... 9 v r be linearly independent elements 
of V. Then one can find elements v r+l9 ... 9 v n such that 

{v u ...,v n } 

is a basis of V 

Proof Since r < n we know that {v l9 ...,v r } cannot form a basis of V, 
and thus cannot be a maximal set of linearly independent elements of V. 
In particular, we can find v r+1 in V such that 

are linearly independent. If r + 1 < n, we can repeat the argument. We 
can thus proceed stepwise (by induction) until we obtain n linearly inde¬ 
pendent elememts {v l9 ... 9 v n }. These must be a basis by Theorem 3.4 and 
our corollary is proved. 

Theorem 3.7. Let V be a vector space having a basis consisting of n 
elements. Let W be a subspace which does not consist of O alone. Then 
W has a basis , and the dimension of W is ^ n. 
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Proof. Let w x be a non-zero element of W. If {wj is not a maximal 
set of linearly independent elements of W, we can find an element w 2 of 
W such that w l9 w 2 are linearly independent. Proceeding in this manner, 
one element at a time, there must be an integer m rg n such that we can 
find linearly independent elements w 1? w 2 ,...,w m , and such that 

is a maxmal set of linearly independent elements of W (by Theorem 3.1 
we cannot go on indefinitely finding linearly independent elements, and 
the number of such elements is at most n). If we now use Theorem 3.3, 
we conclude that {wj,...,w m } is a basis for W. 


I, §4. SUMS AND DIRECT SUMS 

Let V be a vector space over the field K. Let U , W be subspaces of V 
We define the sum of U and W to be the subset of V consisting of all 
sums u -h w with u e U and weW. We denote this sum by U 4- W. It is 
a subspace of V Indeed, if u l9 u 2 eU and w l9 w 2 eW then 

(u x 4 w x ) 4- ( u 2 4 w 2 ) = u 1 + u 2 4- w x + w 2 e U + VK 

7/ceK, then 


c(i 4 4 w x ) = CMi 4 cw x e U 4 JL 

Finally, 0 4 0 e IT This proves that U 4 IT is a subspace. 

We shall say that V is a direct sum of U and IT if for every element v 
of V there exist unique elements ueU and weW such that v = u 4 w. 

Theorem 4.1. Let V be a vector space over the field K , and let U , W be 
subspaces. IfU+W=V, and if U n W = {0}, then V is the direct 
sum of U and W. 

Proof. Given v e V, by the first assumption, there exist elements ueU 
and we W such that v = u 4 w. Thus K is the sum of U and IT. To 
prove it is the direct sum, we must show that these elements m, w are 
uniquely determined. Suppose there exist elements u' eU and WeW such 
that v = u' 4 W. Thus 


M 4 w = m' 4 w'. 


Then 


u — u' = W — w. 
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But u — u' eU and w' — we W. By the second assumption, we conclude 


that u — u' = O and w' — w — O 
proving our theorem. 

As a matter of notation, when 
we write 

V- 


, whence u = u! and w = w', thereby 

V is the direct sum of subspaces U , W 
: U@W. 


Theorem 4.2. Let V be a finite dimensional vector space over the field 
K. Let W be a subspace. Then there exists a subspace U such that V is 
the direct sum of W and U. 

Proof We select a basis of W, and extend it to a basis of V, using 
Corollary 3.6. The assertion of our theorem is then clear. In the nota¬ 
tion of that theorem, if {v l9 ...,v r j is a basis of W 9 then we let U be the 
space generated by {v r+1 ,... ,v n }. 

We note that given the subspace W 9 there exist usually many subs¬ 
paces U such that V is the direct sum of W and U. (For examples, see 
the exercises.) In the section when we discuss orthogonality later in this 
book, we shall use orthogonality to determine such a subspace. 

Theorem 4.3. If V is a finite dimensional vector space over K , and is 
the direct sum of subspaces U , W then 

dim V = dim U + dim IT 


Proof Let {u u ...,u r } be a basis of U , and {w 1 ,...,w s } a basis of W. 
Every element of U has a unique expression as a linear combination 
x x u x + • • • + x r u r9 with x t eK, and every element of IT has a unique ex¬ 
pression as a linear combination y 1 w l + ••• + y s w s with yj^K. Hence by 
definition, every element of V has a unique expression as a linear com¬ 
bination 

x l u 1 + • • • + x r u r + y x w x + • • • + y s w s , 

thereby proving that M 1? ...,w r , w l9 ...,w s is a basis of V, and also proving 
our theorem. 


Suppose now that U , W are arbitrary vector spaces over the field K 
(i.e. not necessarily subspaces of some vector space). We let U x W be 
the set of all pairs (u, w) whose first component is an element u of U and 
whose second component is an element w of IT We define the addition 
of such pairs componentwise, namely, if (u l9 wf) e U x IT and 
(w 2 , w 2 ) g U x IT we define 


(Mi, Wj + (M 2 , W 2 ) = (Uj + U 2 , Wj + W 2 ). 
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If cgX we define the product c(u u w x ) by 

c(u l9 Wi) = (CM^CWi). 

It is then immediately verified that U x W is a vector space, called the 
direct product of U and W. When we discuss linear maps, we shall com¬ 
pare the direct product with the direct sum. 

If n is a positive integer, written as a sum of two positive integers, 
n = r + s, then we see that K n is the direct product K r x K s . 

We note that 


dim (U x W) = dim U 4- dim IL 


The proof is easy, and is left to the reader. 

Of course, we can extend the notion of direct sum and direct product 
of several factors. Let V l9 ... 9 V n be subspaces of a vector space V We 
say that V is the direct sum 


n 


V= ®V,= Fi®---©F„ 

i = 1 


if every element veV has a unique expression as a sum 


v = v x + • • ■ + v n with g Vi. 


A “unique expression” means that if 

v = v\ + • • • -h v' n with v'i e Vi 
then v'i = v t for i = 1,... ,n. 

Similarly, let W l9 ... 9 W n be vector spaces. We define their direct pro¬ 
duct 


n 

n Wi = W 1 x ••• x W n 

i= 1 


to be the set of n-tuples (w u ...,w n ) with w^g!^. Addition is defined 
componentwise, and multiplication by scalars is also defined compo¬ 
nentwise. Then this direct product is a vector space. 
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I, §4. EXERCISES 

1. Let V= R 2 , and let W be the subspace generated by (2,1). Let U be the sub¬ 
space generated by (0, 1). Show that V is the direct sum of W and U. If U' is 
the subspace generated by (1, 1), show that V is also the direct sum of W and 
U'. 

2. Let V=K 3 for some field K. Let W be the subspace generated by (1,0,0), 
and let U be the subspace generated by (1, 1, 0) and (0,1,1). Show that V is 
the direct sum of W and U. 

3. Let A, B be two vectors in R 2 , and assume neither of them is O. If there is 
no number c such that cA = B , show that A, B form a basis of R 2 , and that 
R 2 is a direct sum of the subspaces generated by A and B respectively. 

4. Prove the last assertion of the section concerning the dimension of U x W. If 
{u l9 ...,u r } is a basis of U and {w 1 ,...,wj is a basis of W 9 what is a basis of 
U x W? 
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II, §1. THE SPACE OF MATRICES 

We consider a new kind of object, matrices. Let K be a field. Let n , m 
be two integers ^ 1. An array of numbers in K 


a ll a \2 a l3 
a 21 a 22 a 23 


l m 1 u m 2 


is called a matrix in K. We can abbreviate the notation for this matrix 
by writing it (a tJ ), i = 1 and j = 1 We say that it is an m by 

n matrix, or an m x n matrix. The matrix has m rows and n columns. 
For instance, the first column is 


and the second row is (a 21 , a 22 ,- • • ’ a 2*)- We call the z/-entry or 17- 
component of the matrix. If we denote by A the above matrix, then the 
i-th row is denoted by A i9 and is defined to be 


Ai (#ii? Cli2 9 • • • 9^in)- 
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The y-th column is denoted by A\ and is defined to be 



Example 1. The following is a 2 x 3 matrix: 



It has two rows and three columns. 

The rows are (1, 1, —2) and (—1,4, —5). The columns are 



Thus the rows of a matrix may be viewed as n-tuples, and the columns 
may be viewed as vertical m-tuples. a vertical m-tuple is also called a 

column vector. 

A vector (x 1} ...,x„) is a 1 x n matrix. A column vector 


x. 


is an n x 1 matrix. 

When we write a matrix in the form (a l7 ), then i denotes the row and 
j denotes the column. In Example 1, we have for instance a xl = 1, 
a 23 — ~ 5 . 

A single number (a) may be viewed as a 1 x 1 matrix. 

Let i = 1,... ,m and j = 1,... ,n be a matrix. If m = n, then we say 
that it is a square matrix. Thus 



are both square matrices. 
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We have a zero matrix in which a tj = 0 for all i, j. It looks like this: 


0 0 0 
0 0 0 



v0 0 0 ... 0 


We shall write it O. We note that we have met so far with the zero 
number, zero vector, and zero matrix. 


We shall now define addition of matrices and multiplication of ma¬ 
trices by numbers. 

We define addition of matrices only when they have the same size. 
Thus let m, n be fixed integers ^ 1. Let A = {a i} ) and B = (b 0 ) be two 
m x n matrices. We define A + B to be the matrix whose entry in the 
i-th row and j-th column is a tj + b tj . In other words, we add matrices of 
the same size componentwise. 

Example 2. Let 



If O is the zero matrix, then for any matrix A (of the same size, of 
course), we have 0 + A = A + 0 = A. This is trivially verified. 

We shall now define the multiplication of a matrix by a number. Let 
c be a number, and A = {a i3 ) be a matrix. We define cA to be the ma¬ 
trix whose i)'-component is ca i} . We write cA = ( ca tj ). Thus we multiply 
each component of A by c. 

Example 3. Let A , B be as in Example 2. Let c = 2. Then 



We also have 


(-1 )A 



For all matrices A, we find that A 4- (—1)^4 = O. 

We leave it as an exercise to verify that all properties VS 1 through 
VS 8 are satisfied by our rules for addition of matrices and multiplication 
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of matrices by elements of K. The main thing to observe here is that 
addition of matrices is defined in terms of the components, and for the 
addition of components, the conditions analogous to VS 1 through VS 4 
are satisfied. They are standard properties of numbers. Similarly, VS 5 
through VS 8 are true for multiplication of matrices by elements of K , 
because the corresponding properties for the multiplication of elements of 
K are true. 

We see that the matrices (of a given size m x n) with components in a 
field K form a vector space over K which we may denote by 
Mat mXn (K). 

We define one more notion related to a matrix. Let A = (a tj ) be an 
m x n matrix. The n x m matrix B = (bjf) such that = a V} is called the 
transpose of A , and is also denoted by *A. Taking the transpose of a 
matrix amounts to changing rows into columns and vice versa. If A is 
the matrix which we wrote down at the beginning of this section, then l A 
is the matrix 

Ml a 21 a 31 ' ' ' a ml\ 

I &12 a 22 a 32 ' ’ * a m2 \ 




If A = (2, 1, —4) is a row vector , then 



is a column vector. 

A matrix A is said to be symmetric if it is equal to its transpose, i.e. if 
l A = A. A symmetric matrix is necessarily a square matrix. For instance, 
the matrix 



is symmetric. 
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Let A = (a u ) be a square matrix. We call a ll9 ...,a HH its diagonal com¬ 
ponents. A square matrix is said to be a diagonal matrix if all its 
components are zero except possibly for the diagonal components, i.e. if 
aij = 0 if i ^ j. Every diagonal matrix is a symmetric matrix. A diagonal 
matrix looks like this: 



We define the unit n x n matrix to be the square matrix having all its 
components equal to 0 except the diagonal components, equal to 1. We 
denote this unit matrix by or / if there is no need to specify the n. 
Thus: 



II, §1. EXERCISES ON MATRICES 

1. Let 



Find A + B, 3 B, -2 B, A + 2B, 2A — B, A — 2 B, B - A. 
2. Let 



Find A + B, 3 B, -2 B, A + 2B, A - B, B - A. 

3. In Exercise 1, find X A and l B. 

4. In Exercise 2, find l A and *B. 

5. If A, B are arbitrary m x n matrices, show that 


\A + B) = l A + <B. 
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6. If c is a number, show that 


\cA ) = c'A. 


7. If A — {a i} ) is a square matrix, then the elements a u are called the diagonal 
elements. How do the diagonal elements of A and X A differ? 

8. Find X (A + B ) and X A + X B in Exercise 2. 

9. Find A 4- X A and B + X B in Exercise 2. 

10. Show that for any square matrix A, the matrix A + X A is symmetric. 

11. Write down the row vectors and column vectors of the matrices A, B in 
Exercise 1. 

12. Write down the row vectors and column vectors of the matrices A , B in 
Exercise 2. 


II, §1. EXERCISES ON DIMENSION 

1. What is the dimension of the space of 2 x 2 matrices? Give a basis for this 
space. 

2. What is the dimension of the space ofwixn matrices? Give a basis for this 
space. 

3. What is the dimension of the space of n x n matrices of all of whose com¬ 
ponents are 0 except possibly the diagonal components? 

4. What is the dimensison of the space of n x n matrices which are upper- 
triangular, i.e. of the following type: 



5. What is the dimension of the space of symmetric 2x2 matrices (i.e. 2x2 
matrices A such that A = X A)7 Exhibit a basis for this space. 

6. More generally, what is the dimension of the space of symmetric n x n ma¬ 
trices? What is a basis for this space? 

7. What is the dimension of the space of diagonal n x n matrices? What is a 
basis for this space? 

8. Let V be a subspace of R 2 . What are the possible dimensions for VI 

9. Let V be a subspace of R 3 . What are the possible dimensions for VI 
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II, §2. LINEAR EQUATIONS 

We shall now give applications of the dimension theorems to the solu¬ 
tion of linear equations. 

Let K be a field. Let A = ( a tj ), i = 1,... ,m and j = 1,... ,n be a matrix 
in K. Let b u ...,b m be elements of K. Equations like 


(*) 


a n x i + ••• + a ln x n = bi 

a mi x i + • • • + a m „ x n = b m 


are called linear equations. We shall also say that (*) is a system of lin¬ 
ear equations. The system is said to be homogeneous if all the numbers 
b u ...,b m are equal to 0. The number n is called the number of un¬ 
knowns, and m is called the number of equations. We call {a tj ) the ma¬ 
trix of coefficients. 

The system of equations 


(**) 


a i i x i + ••• + a ln x„ = 0 

a m i*i + ••• + a mn x n = 0 


will be called the homogeneous system associated with (*). 

The system (**) always has a solution, namely, the solution ob¬ 
tained by letting all Xj = 0. This solution will be called the trivial solu¬ 
tion. A solution (xsuch that some x t ^ 0 is called non-trivial. 

We consider first the homogeneous system (**). We can rewrite it in 
the following way: 


*i 




= 0 , 


or in terms of the column vectors of the matrix A = (a^), 

x x A x + ••• + x n A n = O. 

A non-trivial solution X = (x l5 ... ,x„) of our system (**) is therefore 
nothing else than an n-tuple X ^ 0 giving a relation of linear depen¬ 
dence between the columns A x ,...,A n . This way of rewriting the system 
gives us therefore a good interpretation, and allows us to apply Theorem 
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3.1 of Chapter I. The column vectors are elements of K m , which has 
dimension m over K. Consequently: 

Theorem 2.1. Let 


flu*i + ••• + a ln x n = 0 
+ ••• + a mn x n = 0 

be a homogeneous system of m linear equations in n unknowns , with 
coefficients in a field K. Assume that n > m. Then the system has a 
non-trivial solution in K. 

Proof. By Theorem 3.1 of Chapter I, we know that the vectors 
A 1 ,...,A n must be linearly dependent. 

Of course, to solve explicitly a system of linear equations, we have so 
far no other method than the elementary method of elimination from ele¬ 
mentary school. Some computational aspects of solving linear equations 
are discussed at length in my Introduction to Linear Algebra , and will 
not be repeated here. 

We now consider the original system of equations (*). Let B be the 
column vector 


B = 



Then we may rewrite (*) in the form 



or abbreviated in terms of the column vectors of A , 

x 1 A 1 + • • • + x n A n = B. 


Theorem 2.2. Assume that m = n in the system (*) above , and that the 
vectors A l 9 ... 9 A n are linearly independent. Then the system (*) has a 
solution in X, and this solution is unique. 
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Proof ‘ The vectors A l ,...,A n being linearly independent, they form a 
basis of K n . Hence any vector B has a unique expression as a linear 
combination 

B = x 1 A 1 + ••• + x n A n , 

with x t eK, and X = (x l9 . ..,x„) is therefore the unique solution of the 
system. 


II, §2. EXERCISES 

1. Let (**) be a system of homogeneous linear equations in a field K , and as¬ 
sume that m = n. Assume also that the column vectors of coefficients are 
linearly independent. Show that the only solution is the trivial solution. 

2. Let (**) be a system of homogeneous linear equations in a field K, in n un¬ 
knowns. Show that the set of solutions X = (xx„) is a vector space over 
K. 

3. Let A l y ...,A n be column vectors of size m. Assume that they have coefficients 
in R, and that they are linearly independent over R. Show that they are 
linearly independent over C. 

4. Let (**) be a system of homogeneous linear equations with coefficients in R. 
If this system has a non-trivial solution in C, show that it has a non-trivial 
solution in R. 


II, §3. MULTIPLICATION OF MATRICES 

We shall consider matrices over a field K . We begin by recalling the dot 
product defined in Chapter I. Thus if A = (a l9 ... 9 a n ) and B = (b u ...,b n ) 
are in K n , we define 

A • B = a 1 b 1 + ••• + a n b n . 

This is an element of K. We have the basic properties: 

SP 1. For all A, B in K n , we have A - B = B - A. 

SP 2. If A, B , C are in K n , then 


A (B + C) = A B + A C = (B + C) A. 


SP 3. IfxeK, then 


(xA) -B = x(A -B) and A • (xB) = x(A B). 
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If A has components in the real numbers R, then 

A 2 = a 2 + • • • + a 2 ^ 0, 

and if A ^ O then A 2 > 0, because some af > 0. Notice however that 
the positivity property does not hold in general. For instance, if K = C, 
let A = (1, i ). Then A ^ O but 

A • A = 1 + i 2 = 0. 

For many applications, this positivity is not necessary, and one can use 
instead a property which we shall call non-degeneracy, namely: 

If A e X", and if A- X = 0 for all XeK n then A = O. 

The proof is trivial, because we must have A • E t = 0 for each unit vector 
Ei = (0,...,0, 1,0,...,0) with 1 in the i-th component and 0 otherwise. 
But A • = a h and hence a t = 0 for all i, so that A = O. 

We shall now define the product of matrices. 

Let A = (a*;), i = 1,... ,m and j = 1,... ,n, be an m x n matrix. Let 
B = ( b jk \ j = 1,... ,n and k = 1,... ,s, be an n x s matrix. 



We define the product AB to be the m x s matrix whose i7c-coordinate is 


Z a ijbjk — a nbik + ^i2^2k + *•• + a in b nk . 
j= i 

If A 1 ,...,A m are the row vectors of the matrix A , and if B 1 ,...,# 5 are the 
column vectors of the matrix £, then the i/c-coordinate of the product 
AB is equal to A r B k . Thus 

/A.B 1 .. A X B S \ 

AB = l : i 

W^ 1 ••• 

Multiplication of matrices is therefore a generalization of the dot prod¬ 
uct. 
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Example 1. Let 



Then AB is a 2 x 2 matrix, and computations show that 



Example 2. Let 



Let A, B be as in Example 1. Then 



Compute ( AB)C . What do you find? 

Let A be an m x n matrix and let B be an n x 1 matrix, i.e. a column 
vector. Then AB is again a column vector. The product looks like this: 



where 

n 

c i = Z a U b i = a n b i + • • • + a i „h l 

j=i 
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If X = (x 1 ,...,x m ) is a row vector, i.e. a 1 x m matrix, then we can 
form the product XA, which looks like this: 


where 


a \i 


«ln 

: I = (J'x. • • • ,y«l 


^ ^ml * * * ^mn , 


y k = x 1 a lk + • • • + x m a mk . 


In this case, XA is a 1 x n matrix, i.e. a row vector. 


Theorem 3.1. Let A, B, C be matrices. Assume that A, B can be mul¬ 
tiplied, and A, C can be multiplied, and B, C can be added. Then 
A, B + C can be multiplied, and we have 


A(B + C) — AB + AC. 


If x is a number, then 

A(xB) = x(AB). 

Proof. Let A t be the i-th row of A and let B k , C k be the k-th column 
of B and C, respectively. Then B k + C k is the k-th column of B + C. 
By definition, the iTc-component of AB is A t B k , the iTc-component of AC 
is A t C k , and the i/c-component of A(B + C) is A t (B k 4- C k ). Since 

• ( B k + C fc ) = A t B k + A t C k , 

our first assertion follows. As for the second, observe that the k-th col¬ 
umn of xB is xB k . Since 

A i xB k = x{A i B k ), 


our second assertion follows. 

Theorem 3.2. Let A, B, C be matrices such that A, B can be multiplied 
and B, C can be multiplied. Then A, BC can be multiplied. So can 
AB, C, and we have 

(AB)C = A(BC). 

Proof. Let A = (a u ) be an m x n matrix, let B = ( b jk ) be an n x r ma¬ 
trix, and let C = ( c kl ) be an r x s matrix. The product AB is an m x r 
matrix, whose i/c-component is equal to the sum 


a ilblk + a i2b>2k + "• + a in bnk • 
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We shall abbreviate this sum using our £ notation by writing 


I a ij b jk■ 

7=1 

By definition, the //-component of ( AB)C is equal to 



The sum on the right can also be described as the sum of all terms 

X a ijbjk c kh 


where j, k range over all integers 1 and 1 ^ k ^ r respectively. 

If we had started with the jl -component of BC and then computed the 
//-component of A(BC) we would have found exactly the same sum, 
thereby proving the theorem. 

Let A be a square n x n matrix. We shall say that A is invertible or 
non-singular if there exists an n x n matrix B such that 

AB = BA = 

Such a matrix B is uniquely determined by A , for if C is such that AC = 
CA = /„, then 


B = BI n = B(AC) = ( BA)C = I n C = C. 


(Cf. Exercise 1.) This matrix B will be called the inverse of A and will be 
denoted by A~ l . When we study determinants, we shall find an explicit 
way of finding it, whenever it exists. 

Let A be a square matrix. Then we can form the product of A with 
itself, say AA, or repeated products, 

A -A 

taken m times. By definition, if m is an integer ^ 1, we define A m to 
be the product A • • • A taken m times. We define A 0 = I (the unit matrix 
of the same size as A). The usual rule A r+S = A r A s holds for integers 
r, s ^ 0. 

The next result relates the transpose with multiplication of matrices. 
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Theorem 3.3. Let A , B be matrices which can be multiplied . Then *£, % A 
can be multiplied , and 


\AB) = t B t A. 


Proof. Let A = (a^) and B = ( b jk ). Let ,4# = C. Then 


-ik 


n 


I a ijb jk - 
j= 1 


Let r £ = (bj^) and U = Then the /d-component of 'ITA is by defini¬ 
tion 


n 


I ^ 

j= 1 


Since b' kj = b jk and a' t - = a tj we see that this last expression is equal to 

n n 

bjk^ij X! a ijbjk‘ 
j= 1 i=l 

By definition, this is the /d-component of f C, as was to be shown. 

In terms of multiplication of matrices, we can now write a system of 
linear equations in the form 

AX = £, 

where A is an m x n matrix, I is a column vector of size n, and B is a 
column vector of size m. 


II, §3. EXERCISES 

1. Let / be the unit n x n matrix. Let A be an n x r matrix. What is IA1 If A 
is an m x n matrix, what is All 

2. Let O be the matrix all of whose coordinates are 0. Let A be a matrix of a 
size such that the product AO is defined. What is AOl 
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3. In each one of the following cases, find ( AB)C and A(BC). 

«-G 

(b>^ = (3 ; 

•^-G 0 -i}-(l 

4. Let A , B be square matrices of the same size, and assume that AB = BA. 
Show that + B) 2 = A 2 + 2/4J3 + B 2 , and 

(^4 + £)04 - fl) = /l 2 - £ 2 , 



using the properties of matrices stated in Theorem 3.1. 

5. Let 



Find AB and BA. 

6. Let 



Let A, B be as in Exercise 5. Find CA , AC, CB, and BC. State the general 
rule including this exercise as a special case. 

7. Let X = (1, 0, 0) and let 


A = 



1 

0 

1 



What is XA? 

8. Let X = (0, 1, 0), and let A be an arbitrary 3x3 matrix. How would you 
describe XA? What if X = (0, 0, 1)? Generalize to similar statements con¬ 
cerning n x n matrices, and their products with unit vectors. 

9. Let A, B be the matrices of Exercise 3(a). Verify by computation that 
t (AB) = t B t A. Do the same for 3(b) and 3(c). Prove the same rule for any 
two matrices A, B (which can be multiplied). If A, B, C are matrices which 
can be multiplied, show that ‘(ABC) = t C t B t A. 
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10. Let M be an n x n matrix such that l M — M. Given two row vectors in n- 
space, say A and B define (A, B} to be AM l B . (Identify a 1 x 1 matrix with 
a number.) Show that the conditions of a scalar product are satisfied, except 
possibly the condition concerning positivity. Give an example of a matrix M 
and vectors A, B such that AM f B is negative (taking n = 2). 

11. (a) Let A be the matrix 



Find A 2 , A 3 . Generalize to 4 x 4 matrices, 
(b) Let A be the matrix 



Compute A 2 , A 3 , A 4 . 

12. Let X be the indicated column vector, and A the indicated matrix. Find AX 
as a column vector. 



13. Let 





Find AX for each of the following values of X. 


(a) X = 



(b) X = 



(c) X = 


0 

0 

1 
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14. Let 



\2 1 8 / 


Find AX for each of the values of X given in Exercise 13. 

15. Let 



What is AX? 

16. Let X be a column vector having all its components equal to 0 except the 
i-th component which is equal to 1. Let A be an arbitrary matrix, whose size 
is such that we can form the product AX. What is AX? 

17. Let A = (a tj ), i = 1 and j = 1,. ..,n, be an m x n matrix. Let B = ( b jk ), 

j = 1 and k = l,...,s, be an n x s matrix. Let AB = C. Show that the 

/c-th column C k can be written 

C k = b, k A' + --- + b nk A\ 

(This will be useful in finding the determinant of a product.) 

18. Let A be a square matrix. 

(a) If A 2 = O show that I — A is invertible. 

(b) If A 3 = O show that I — A is invertible. 

(c) In general, if A n = 0 for some positive integer n , show that I — A is in¬ 
vertible. 

(d) Suppose that A 2 + 2A + / = O. Show that A is invertible. 

(e) Suppose that A 3 — A + I = O. Show that A is invertible. 

19. Let a , b be numbers, and let 



What is AB? What is A n where n is a positive integer? 

20. Show that the matrix A in Exercise 19 has an inverse. What is this inverse? 

21. Show that if A, B are n x n matrices which have inverses, then AB has an 
inverse. 

22. Determine all 2 x 2 matrices A such that A 2 = O. 
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/cos 9 

— sin 

, /cos 29 

— sin 20\ 

23. Let A = . „ 

cos 9 I 

Show that A 2 — [ . ^ 

cos 29 ) 

l sin 9 

\ sin 20 


Determine A n by induction for any positive integer n. 

24. Find a 2 x 2 matrix A such that A 2 — —1 = 

25. Let A be an n x n matrix. Define the trace of A to be the sum of the 
diagonal elements. Thus if A = (a^), then 

trM) = X ««• 

i= 1 

For instance, if 



then tr(v4) =1+4 = 5. If 

/ 1 - 1 5 

A=l2 1 3 

\l -4 7 

then tv(A) = 9. Compute the trace of the following matrices: 

( 1 7 3\ / 3 -2 4\ 1-2 1 l\ 

-1 5 2 (b) 1 4 1 (c) 3 4 4 

2 3 -4/ \-7 -3 -3/ \-5 2 6/ 




26. Let A, B be the indicated matrices. Show that 

tv(AB) = tr(BA). 

1-1 l\ / 3 1 2 

2 4 1 I, B = j 1 1 0 

3 0 1/ \—121 

/ 1 7 3 \ / 3 - 

(b) 4 = 1—1 5 2 )’ B = I 1 

27. Prove in general that if A , B are square n x n matrices, then 

tr(4B) = tr (BA). 

28. For any square matrix A , show that tr(4) = tr(*4). 
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29. Let 


Find A 2 , A 3 , A 4 . 




30. Let A be a diagonal matrix, with diagonal elements a l9 What is 

A 2 , A 3 , A k for any positive integer fc? 


31. Let 


Find A 3 . 




32. Let A be an invertible n x n matrix. Show that 


t (A~') = CA)-'. 


We may therefore write X A 1 without fear of confusion. 

33. Let A be a complex matrix, A = (a tj ) 9 and let A = where the bar means 
complex conjugate. Show that 


\A) = 'A. 


We then write simply X A. 

34. Let A be a diagonal matrix: 



If a { ^ 0 for all i, show that A is invertible. What is its inverse? 

35. Let A be a strictly upper triangular matrix, i.e. a square matrix (a tj ) having 
all its components below and on the diagonal equal to 0. We may express 
this by writing a tj = 0 if i ^ j : 



Prove that A n = O. (If you wish, you may do it only in case n = 2, 3 and 4. 
The general case can be done by induction.) 
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36. Let A be a triangular matrix with components 1 on the diagonal: 



Let N = A — Show that N n + 1 = 0. Note that 4 = / + N. Show that A 
is invertible, and that its inverse is 


(/ + N)~ 1 = / — iV + AT 2 -+ (— 


37. If N is a square matrix such that N r+i = O for some positive integer r, show 
that I — N is invertible and that its inverse is / + iV + • • • + N r . 

38. Let 4 be a triangular matrix: 



Assume that no diagonal element is 0, and let 



Show that BA and AB are triangular matrices with components 1 on the 
diagonal. 

39. A square matrix A is said to be nilpotent if A r = O for some integer r ^ 1. 
Let A , B be nilpotent matrices, of the same size, and assume AB = BA. 
Show that AB and A + B are nilpotent. 



CHAPTER III 


Linear Mappings 


We shall define the general notion of a mapping, which generalizes the 
notion of a function. Among mappings, the linear mappings are the 
most important. A good deal of mathematics is devoted to reducing 
questions concerning arbitrary mappings to linear mappings. For one 
thing, they are interesting in themselves, and many mappings are linear. 
On the other hand, it is often possible to approximate an arbitrary map¬ 
ping by a linear one, whose study is much easier than the study of the 
original mapping. This is done in the calculus of several variables. 


Ill, §1. MAPPINGS 

Let S, S' be two sets. A mapping from S to S' is an association which 
to every element of S associates an element of S'. Instead of saying that 
F is a mapping from S into S ', we shall often write the symbols F: S -> S'. 
A mapping will also be called a map, for the sake of brevity. 

A function is a special type of mapping, namely it is a mapping from 
a set into the set of numbers, i.e. into R, or C, or into a field K. 

We extend to mappings some of the terminology we have used for 
functions. For instance, if T: S -► S' is a mapping, and if u is an element 
of S , then we denote by T(m), or Tu , the element of S' associated to u by 
T. We call T(u ) the value of T at u , or also the image of u under T. 
The symbols T(u) are read “T of m”. The set of all elements T(u), when 
u ranges over all elements of S , is called the image of T. If IT is a subset 
of S, then the set of elements T(w), when w ranges over all elements of 
W, is called the image of W under T, and is denoted by T(W). 
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Let F: S -► S' be a map from a set S into a set S'. 
of S , we often write 


x i—► F(x) 


If x is an element 


with a special arrow i—> to denote the image of x under F. Thus, for 
instance, we would speak of the map F such that F(x) = x 2 as the map 
x i—> x 2 . 


Example I. Let S and S' be both equal to R. Let /: R -+ R be the 
function /(x) = x 2 (i.e. the function whose value at a number x is 
x 2 ). Then / is a mapping from R into R. Its image is the set of 
numbers ^ 0. 

Example 2. Let S be the set of numbers ^ 0, and let S' = R. Let 
g:S-+S' be the function such that g(x) = x 1/2 . Then g is a mapping 
from S into R. 


Example 3. Let S be the set of functions having derivatives of all 
orders on the interval 0 < t < 1, and let S' = S. Then the derivative 
D = d/dt is a mapping from S into S. Indeed, our map D associates the 
function df/dt = Df to the function /. According to our terminology, 
Df is the value of the mapping D at /. 

Example 4. Let S be the set of continuous functions on the interval 
[0,1] and let S' be the set of differentiable functions on that interval. 
We shall define a mapping S -* S' by giving its value at any function 
/ in S. Namely, we let J?f (or /(f)) be the function whose value at x is 


(//)(*) = 


m dt. 


Jo 


Then /(f) is differentiable function. 

Example 5. Let S be the set R 3 , i.e. the set of 3-tuples. Let 
A = (2, 3, — 1). Let L: R 3 -*• R be the mapping whose value at a vector 
X = (x,y,z) is A X. Then L(X) = A X. If X = (1, 1, -1), then the 
value of L at X is 6. 

Just as we did with functions, we describe a mapping by giving its 
values. Thus, instead of making the statement in Example 5 describing 
the mapping L, we would also say: Let L:R 3 ->R be the mapping 
L(X) = A X. This is somewhat incorrect, but is briefer, and does not 
usually give rise to confusion. More correctly, we can write X i-> L(X) 
or X i-> A • X with the special arrow i-» to denote the effect of the map 
L on the element X. 
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Example 6. Let F: R 2 ->R 2 be the mapping given by 


F(x, y ) = (2x, 2 y). 


Describe the image under F of the points lying on the circle x 2 + y 2 = 1. 
Let (x, y) be a point on the circle of radius 1. 

Let u = 2x and v = 2y. Then u, v satisfy the relation 

(u/2) 2 + (v/2) 2 = 1 


or in other words, 



Hence ( u, v ) is a point on the circle of radius 2. Therefore the image 
under F of the circle of radius 1 is a subset of the circle of radius 2. 
Conversely, given a point (u, v) such that 

u 2 + v 2 = 4, 

let x = m/2 and y = v/2. Then the point (x, y) satisfies the equation 
x 2 + y 2 = 1, and hence is a point on the circle of radius 1. Furthermore, 
F(x, y) = (m, v ). Hence every point on the circle of radius 2 is the image 
of some point on the circle of radius 1. We conclude finally that the im¬ 
age of the circle of radius 1 under F is precisely the circle of radius 2. 

Note. In general, let S , S' be two sets. To prove that S = S', one fre¬ 
quently proves that S is a subset of S' and that S' is a subset of S . This 
is what we did in the preceding argument. 

Example 7. Let S be a set and let V be a vector space over the field 
K. Let F, G be mappings of S into V. We can define their sum F + G 
as the map whose value at an element t of S is F(t) + G(t). We also de¬ 
fine the product of F by an element c of K to be the map whose value 
at an element t of S is cF(t). It is easy to verify that conditions VS I 
through VS 8 are satisfied. 

Example 8. Let S be a set. Let F:S-+K n be a mapping. For each 
element t of S, the value of F at t is a vector F(t). The coordinates of 
F(t) depend on t. Hence there are functions f l9 ...,/„ of S into K such 
that 
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These functions are called the coordinate functions of F. For instance, if 
K = R and if S is an interval of real numbers, which we denote by J, 
then a map 

F:J-> R" 


is also called a (parametric) curve in n-space. 

Let S be an arbitrary set again, and let F, G: S -► K n be mappings of S 
into K n . Let f l9 ...,/„ be the coordinate functions of F, and g u ...,g n the 
coordinate functions of G. Then G(t) = (g t (t), ...,g n (t)) for all teS. 
Furthermore, 

(F + G)(t) = F(t) + G(t) = (/,(t) + g t (t),... + gjt)), 

and for any ceK, 


(cF)(0 = cF(0 = (c/ 1 (0,...,c/ w (0). 

We see in particular that the coordinate functions of F + G are 

fi + 0i, •••,/„ + g n - 

Example 9. We can define a map F: R-»R” by the association 

t h-> (2t, 10% t 3 ). 

Thus F(t) = (It, 10 J , t 3 ), and F(2) = (4, 100, 8). The coordinate functions 
of F are the functions /i ,/ 2 ,/3 such that 

/i(0 = 2t, / 2 (0 = 10' and f 3 (t) = t 3 . 

Let U, V, W be sets. Let F:U -> V and G: V-> W be mappings. Then 
we can form the composite mapping from U into W, denoted by G°F. 
It is by definition the mapping defined by 

(GoF)(t) = G(F(0) 


for all teU. If /:R->R is a function and #:R->R is also a function, 
then g of is the composite function. 
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The following statement is an important property of mappings. 
Let U, V, W 9 S be sets. Let 

F:U^>V, G:V^W, and H.W^S 
be mappings. Then 

Ho(GoF) = (HoG)oF. 


Proof. Here again, the proof is very simple. By definition, we have, 
for any element u of U: 

(Ho (Go F))(u) = H((G o F)(u)) = H(G(F(u))). 

On the other hand, 

((HoG)o F)(u) = (H o G)(F(u)) = H(G(F(u))). 

By definition, this means that 


Ho(GoF) = (HoG)o F. 


We shall discuss inverse mappings, but before that, we need to men¬ 
tion two special properties which a mapping may have. Let 

f:S^S' 

be a map. We say that / is injective if whenever x, yeS and x ^ y, then 
f(x) 7 ^ / (y). In other words, / is injective means that / takes on distinct 
values at distinct elements of S. Put another way, we can say that / is 
injective if and only if, given x, y e S, 

/(*) = f(y ) implies x = y. 

Example 10. The function 


/: R -> R 

such that f(x) = x 2 is not injective, because /(1) =/(—1) = 1. Also the 
function x h-» sin x is not injective, because sin x = sin(x + 2n). How¬ 
ever, the map /: R -► R such that /(x) = x + 1 is injective, because if 
x + 1 = y + 1 then x = y. 
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Again, let /: S -> S' be a mapping. We shall say that / is surjective if 
the image of / is all of S'. 

The map 

/: R -► R 

such that /(x) = x 2 is not surjective, because its image consists of all 
numbers ^ 0, and this image is not equal to all of R. On the other 
hand, the map of R into R given by x i—> x 3 is surjective, because given a 
number y there exists a number x such that y = x 3 (the cube root of y). 
Thus every number is in the image of our map. 

A map which is both injective and surjective is defined to be bijective. 

Let R + be the set of real numbers ^ 0. As a matter of convention, 
we agree to distinguish between the maps 

R -► R and R + ->R + 

given by the same formula x i—» x 2 . The point is that when we view the 
association x i-> x 2 as a map of R into R, then it is not surjective, and it 
is not injective. But when we view this formula as defining a map from 
R + into R + , then it gives both an injective and surjective map of R + 
into itself, because every positive number has a positive square root, and 
such a positive square root is uniquely determined. 

In general, when dealing with a map f: S -► S', we must therefore al¬ 
ways specify the sets S and S', to be able to say that / is injective, or 
surjective, or neither. To have a completely accurate notation, we should 
write 


fs,s' 

or some such symbol which specifies S and S' into the notation, but this 
becomes too clumsy, and we prefer to use the context to make our 
meaning clear. 

If S is any set, the identity mapping I s is defined to be the map such 
that 7 s (x) = x for all xeS. We note that the identity map is both injec¬ 
tive and surjective. If we do not need to specify the reference to S (be¬ 
cause it is made clear by the context), then we write 7 instead of I s . 
Thus we have 7(x) = x for all x e 5. We sometimes denote I s by id s or 
simply id. 

Finally, we define inverse mappings. Let F: S -* S' be a mapping from 
one set into another set. We say that F has an inverse if there exists a 
mapping G: S' -> S such that 

GoF = I s and F°G = 7 S . 
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By this we mean that the composite maps G°F and F ° G are the iden¬ 
tity mappings of S and S' respectively. 

Example 11. Let S = S' be the set of all real numbers ^ 0. Let 

f:S-+S' 

be the map such that /(x) = x 2 . Then / has an inverse mapping, namely 
the map g: S -+ S such that g(x) = y/x. 

Example 12. Let R >0 be the set of numbers > 0 and let /: R->R >0 
be the map such that /(x) = e x . Then / has an inverse mapping which is 
nothing but the logarithm. 

Example 13. This example is particularly important in geometric ap¬ 
plications. Let V be a vector space, and let u be a fixed element of V. 
We let 

T U :V-+V 

be the map such that T u (v ) = v + u. We call T u the translation by u. If S 
is any subset of V, then T U (S) is called the translation of S by w, and con¬ 
sists of all vectors v + u, with veS. We often denote it by S + u. In the 
next picture, we draw a set S and its translation by a vector u. 



As exercises, we leave the proofs of the next two statements to the 
reader: 

If u u u 2 are elements of V, then T Ul+U2 = T u T U2 . 

If u is an element of V, then T U :V-+V has an inverse mapping which is 
nothing but the translation T_ u . 
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Next, we have: 

Let 

be a map which has an inverse mapping g. Then f is both injective and 
surjective , that is f is bijective. 

Proof. Let x, yeS. Let g: S' -> S be the inverse mapping of /. If 
f(x) = f (y), then we must have 

* = g(f(x)) = g(f(y)) = y, 

and therefore / is injective. To prove that / is surjective, let zeS f . Then 

/(aO)) = z 

by definition of the inverse mapping, and hence z = f (x), where x = g(z). 
This proves that / is surjective. 

The converse of the statement we just proved is also true, namely: 

Let f:S-+S' be a map which is bijective. Then f has an inverse map¬ 
ping. 

Proof. Given zeS\ since / is surjective, there exists xeS such that 
/(x) = z. Since / is injective, this element x is uniquely determined by z, 
and we can therefore define 

g(z) = x. 

By definition of g , we find that f(g(zf) = z, and g(f(xf) = x, so that g is 
an inverse mapping for /. 

Thus we can say that a map f:S-+S' has an inverse mapping if and 
only if f is bijective. 


Ill, §1. EXERCISES 

1. In Example 3, give Df as a function of x when / is the function: 
(a) f(x) = sin x (b) /(x) = e x (c) /(x) = log x 

2. Prove the statement about translations in Example 13. 

3. In Example 5, give L(X) when X is the vector: 

(a) (1, 2, -3) (b) (-1, 5, 0) (c) (2, 1, 1) 
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4. Let F: R -» R 2 be the mapping such that F(t) = (e\ t). What is F(l), F(0), 
F(~ 1)? 

5. Let G: R -> R 2 be the mapping such that G(t) = (t, 2t). Let F be as in Exer¬ 
cise 4. What is (F + G)(l), (F + G)(2), (F + G)(0)? 

6. Let F be as in Exercise 4. What is (2F)(0), (7rF)(l)? 

7. Let X = (1, 1, — 1, 3). Let F: R 4 -► R be the mapping such that for any vec¬ 
tor X = (xj, x 2 , x 3 , x 4 ) we have F(X) = X • A + 2. What is the value of F(X) 
when (a) X = (1, 1, 0, -1) and (b) X = (2, 3, -1, 1)7 

In Exercises 8 through 12, refer to Example 6. In each case, to prove that the 

image is equal to a certain set S , you must prove that the image is contained in 

S , and also that every element of S is in the image. 

8. Let F:R 2 ->R 2 be the mapping defined by F(x, y) = (2x, 3y). Describe the 
image of the points lying on the circle x 2 + y 2 = 1. 

9. Let F: R 2 -> R 2 be the mapping defined by F(x, y) = (xy, y). Describe the im¬ 
age under F of the straight line x = 2. 

10. Let F be the mapping defined by F(x, y) = (e x cos y, e x sin y). Describe the 
image under F of the line x = 1. Describe more generally the image under F 
of a line x = c, where c is a constant. 

11. Let F be the mapping defined by F(t, w) = (cos t, sin t, m). Describe geo¬ 
metrically the image of the (t, w)-plane under F. 

12. Let F be the mapping defined by F(x, y) = (x/3, x/4). What is the image 
under F of the ellipse 



III, §2. LINEAR MAPPINGS 

Let L, V' be the vector spaces over the field K. A linear mapping 

F: V 

is a mapping which satisfies the following two properties. 

LM 1. For any elements u , v in V we have 

F(u + v) = F(u ) + F(v). 

LM 2. For all c in K and v in V we have 


F(cv) = cF(v). 
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If we wish to specify the field X , we also say that F is X-linear. Since 
we usually deal with a fixed field X , we omit the prefix X, and say 
simply that F is linear. 

Example 1. Let F be a finite dimensional space over X, and let 
{v l9 be a basis of V. We define a map 

F : V^K n 

by associating to each element veV its coordinate vector X with respect 
to the basis. Thus if 


v = x 1 v 1 + ... + x n v n9 

with x ( e X, we let 

F(v) = (x l9 ... 9 x n ). 


We assert that F is a linear map. If 


w = y 1 v 1 + ... + y n v n , 

with coordinate vector Y = then 

v + w = (Xi + y x )v x + ••• + (x„ + y n )v n , 


whence F(v + w) = I+ 7= F(v) -f F(w). If ceX, then 


cv = cx 1 v 1 H--f cx n v n , 


and hence F(cp) = cX = cF(v). This proves that F is linear. 

Example 2. Let F = R 3 be the vector space (over R) of vectors in 3- 
space. Let V' = R 2 be the vector space of vectors in 2-space. We can 
define a mapping 

F: R 3 -> R 2 

by the projection, namely F(x, y, z) = (x, y). We leave it to you to check 
that the conditions LM 1 and LM 2 are satisfied. 

More generally, let r, n be positive integers, r < n. Then we have a 
projection mapping 

F: X" -> X r 

defined by the rule 

F(xi,...,x„) = (xj.x r ). 

It is trivially verified that this map is linear. 
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Example 3. Let A = (1, 2, — 1). Let V = R 3 and V' = R. We can de¬ 
fine a mapping L = L A : R 3 -► R by the association X f—► X • A, i.e. 


L(X) = X-T 


for any vector X in 3-space. The fact that L is linear summarizes two 
known properties of the scalar product, namely, for any vectors X, Y in 
R 3 we have 


(X + Y)A = XA + YA, 
(cX)-A = c(X • A). 


More generally, let K be a field, and A a fixed vector in K n . We have 
a linear map (i.e. X-linear map) 

L A : K n K 

such that L a (X) = X A for all XeK n . 

We can even generalize this to matrices. Let A be an m x n matrix in 
a field K. We obtain a linear map 

L A : K n K m 

such that 

L a (X) = AX 


for every column vector X in K n . Again the linearity follows from prop¬ 
erties of multiplication of matrices. If A = (a t j) then AX looks like this: 



This type of multiplication will be met frequently in the sequel. 

Example 4. Let V be any vector space. The mapping which associates 
to any element u of V this element itself is obviously a linear mapping, 
which is called the identity mapping. We denote it by id or simply /. 
Thus id(u) = u. 

Example 5. Let L, V' be any vector spaces over the field K. The 
mapping which associates the element 0 in V' to any element u of V is 
called the zero mapping and is obviously linear. It is also denoted by 0. 
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As an exercise (Exercise 2) prove: 

Let L: V -+ W be a linear map. Then L(O) = O. 

In particular, if F.V-+W is a mapping and F(0) # O then F is not lin 
ear. 

Example 6. The space of linear maps. Let F, V' be two vector spaces 
over the field K. We consider the set of all linear mappings from V into 
V\ and denote this set by J£?(F, V'\ or simply if the reference to V, V' 
is clear. We shall define the addition of linear mappings and their mul¬ 
tiplication by numbers in such a way as to make into a vector space. 

Let T:F-»F' and F:F-»F' be two linear mappings. We define 
their sum T + F to be the map whose value at an element u of V is 
T(u) + F(u). Thus we may write 

(T+F)(u)= T(u) + F(u). 

The map T + F is then a linear map. Indeed, it is easy to verify that the 
two conditions which define a linear map are satisfied. For any elements 
m, v of V, we have 

(T + F)(u + v) = T(u + v) + F(u + v) 

= T(u ) + T(v) + F(u) + F(v) 

= T(u) + F(u) + T(v) + F(v) 

= {T+FXu) + (T+FXv). 


Furthermore, if ceK , then 

(T -F F)(cu) = T(cu) -F F(cu) 
= cT(u) -F cF(u) 
= c[T{u) T F(u)] 
= c[(T+F)(u)]. 


Hence T F F is a linear map. 

If aeK , and T:V-+V' is a linear map, we define a map aT from V 
into V' by giving its value at an element u of V , namely (aT)(u) = aT(u). 
Then it is easily verified that aT is a linear map. We leave this as an 
exercise. 

We have just defined operations of addition and scalar multiplication 
in our set S£. Furthermore, if T: VV' is a linear map, i.e. an element 
of then we can define —T to be (— 1)T, i.e. the product of the 



[HI, §2] 


LINEAR MAPPINGS 


55 


number — 1 by T. Finally, we have the zero-map, which to every ele¬ 
ment of V associates the element 0 of V'. Then is a vector space. In 
other words, the set of linear maps from V into V' is itself a vector 
space. The verification that the rules VS 1 through VS 8 for a vector 
space are satisfied is easy and left to the reader. 

Example 7. Let V = V' be the vector space of real valued functions of 
a real variable which have derivatives of all order. Let D be the deriva¬ 
tive. Then D: V-+ V is a linear map. This is merely a brief way of sum¬ 
marizing known properties of the derivative, namely 

D(f 4- g) = Df 4 Dg , and D(cf) = cDf 

for any differentiable functions /, g and constant c. If / is in V 9 and I is 
the identity map, then 


(D + /)/=£>/ + /. 

Thus when / is the function such that f(x) = e x then (D 4- /)/ is the 
function whose value at x is e x + e x = 2e x . 

If f(x) = sin x, then ((D + /)/)(x) = cos x + sin x. 

Let T: V -» V f be a linear mapping. Let u , v 9 w be elements of V. Then 

T(u 4- v 4- w) = T(u ) 4- T(v) + T(w). 

This can be seen stepwise, using the definition of linear mappings. Thus 

T(u + v 4- w) = T(u 4 p) 4 T(w) = T(w) 4 T(i;) 4- T(w). 

Similarly, given a sum of more than three elements, an analogous prop¬ 
erty is satisfied. For instance, let u l9 ... 9 u n be elements of V. Then 


T(u x 4 • • • 4 m„) - T^) 4 • • • 4 T(u n ). 


The sum on the right can be taken in any order. A formal proof can 
easily be given by induction, and we omit it. 

If a l9 ... 9 a n are numbers, then 


T(a 1 u l 4- ••• 4- a n u n ) = 4- ••• 4- a n T(u n ). 


We show this for three elements. 


T{a x u 4- a 2 v 4- a 3 w) = T(a 1 u) 4- T(a 2 v ) 4- T(a 3 w) 

= a 1 T(u) 4- a 2 T(v ) 4- a 3 T(w). 
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The next theorem will show us how a linear map is determined when 
we know its value on basis elements. 

Theorem 2.1. Let V and W be vector spaces. Let {v l9 ...,v n } be a basis 
of V, and let w x , . ..,w„ be arbitrary elements of W. Then there exists a 
unique linear mapping T: V -> W such that 

T(v 0 = w u ...,T(v n ) = w„. 

If x u ... 9 x n are numbers , then 


T(x 1 v l + • • • + x n v n ) = Xi Wi + • • • + x n w„. 


Proof. We shall prove that a linear map T satisfying the required 
conditions exists. Let v be an element of V, and let x l9 ...,x n be the 
unique numbers such that v = x 1 r 1 + ••• + x n v n . We let 

T(v) = x l w l + ••• + x„w„. 


We then have defined a mapping T from V into W, and we contend that 
T is linear. If v' is an element of V , and if v' = y x v t + • • • + y n v n9 then 


v + xf = (x x + y x )v i + • • • + (x B + y>„. 


By definition, we obtain 

T(t; + i/) = (x x + yJWi + • • • + (x B + y w )vv M 
= x x w x + + • • • + + y„w„ 

= T(v) + T(v'). 

Let c be a number. Then cv = cx x v x + ••• + cx n v n , and hence 


T(ci>) = H-h cx n w n = cT(v). 


We have therefore proved that T is linear, and hence that there exists a 
linear map as asserted in the theorem. 

Such a map is unique, because for any element x^ + • • • + x n v n of L, 
any linear map F:V-^W such that F(v t ) = w f (i=l,...,w) must also 
satisfy 


+ • • • + x„i;„) = + • • • + x n F(v n ) 

= x x w x + ••• + x w vv„. 


This concludes the proof. 
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III, §2. EXERCISES 

1. Determine which of the following mappings F are linear. 

(a) F: R 3 -► R 2 defined by F(x, y, z) = (x, z) 

(b) F: R 4 - R 4 defined by F(X) = - X 

(c) F: R 3 - R 3 defined by F(X) = X + (0, - 1, 0) 

(d) F: R 2 -► R 2 defined by F(x, y) = (2x + y, y) 

(e) F: R 2 -► R 2 defined by F(x , y) = (2x, y — x) 

(f) F : R 2 -► R 2 defined by F(x, y) = (y, x) 

(g) F: R 2 -► R defined by F(x, y) = xy 

(h) Let U be an open subset of R 3 , and let V be the vector space of dif¬ 
ferentiable functions on U. Let V' be the vector space of vector fields on 
U. Then grad: V V is a mapping. Is it linear? (For this part (h) we 
assume you know some calculus.) 

2. Let T: V -► W be a linear map from one vector space into another. Show 
that T(O) = O. 

3. Let T: V -► W be a linear map. Let w, t; be elements of V, and let Tu = w. If 
Tv = 0, show that T(u + v) is also equal to w. 

4. Let T: VW be a linear map. Let U be the subset of elements ueV such 
that T(u) = O. Let w e W and suppose there is some element v 0 e V such 
that T(v 0 ) = w. Show that the set of elements veV satisfying T(v) = w is 
precisely v 0 + U. 

5. Let T:V-*W be a linear map. Let v be an element of V. Show that 
T( — v)= — T(v). 

6. Let V be a vector space, and f:V-> R, g:V^> R two linear mappings. Let 
F: F-+R 2 be the mapping defined by F(v) = (f(v),g(v)). Show that F is lin¬ 
ear. Generalize. 

7. Let V, W be two vector spaces and let F: V -► IF be a linear map. Let U be 
the subset of V consisting of all elements v such that F(v) = O. Prove that U 
is a subspace of V. 

8. Which of the mappings in Exercises 4, 7, 8, 9, of §1 are linear? 

9. Let F be a vector space over R, and let v, weV. The line passing through v 
and parallel to w is defined to be the set of all elements v + tw with t e R. 
The line segment between v and v + w is defined to be the set of all elements 

v + tw with 0 ^ t ^ 1. 

Let L:V-*U be a linear map. Show that the image under L of a line seg¬ 
ment in V is a line segment in U. Between what points? 

Show that the image of a line under L is either a line or a point. 

Let V be a vector space, and let v l9 v 2 be two elements of V which are 
linearly independent. The set of elements of V which can be written in the 
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form t ± v ± + t 2 v 2 with numbers t l 9 t 2 satisfying 

0 ^ t Y ^ 1 and 0 ^ t 2 ^ 1 

is called the parallelogram spanned by v l 9 v 2 . 

10. Let V and W be vector spaces, and let F: V-> W be a linear map. Let v l9 v 2 
be linearly independent elements of V, and assume that F(v t ), F(v 2 ) are 
linearly independent. Show that the image under F of the parallelogram 
spanned by v l and v 2 is the parallelogram spanned by F(v t ), F(v 2 ). 

11. Let F be a linear map from R 2 into itself such that 

F(£ 1 ) = ( 1,1) and F(E 2 ) = (-1, 2). 

Let S be the square whose corners are at (0,0), (1,0), (1, 1), and (0, 1). Show 
that the image of this square under F is a parallelogram. 

12. Let A, B be two non-zero vectors in the plane such that there is no constant 
c 7^0 such that B = cA. Let T be a linear mapping of the plane into itself 
such that T(E ] ) = A and T(E 2 ) = B. Describe the image under T of the rec¬ 
tangle whose corners are (0, 1), (3, 0), (0, 0), and (3, 1). 

13. Let A, B be two non-zero vectors in the plane such that there is no constant 
c ^ 0 such that B = cA. Describe geometrically the set of points tA + uB for 
values of t and u such that 0 ^ t ^ 5 and 0 ^ u ^ 2. 

14. Let T u \ V-> V be the translation by a vector u. For which vectors u is T u a 
linear map? Proof? 

15. Let V, W be two vector spaces, and F: V-> W a linear map. Let w 1 ,...,w„ be 
elements of W which are linearly independent, and let v u ...,v n be elements of 

V such that F(v t ) = w f for i= 1 ,...,n. Show that v l 9 ...,v n are linearly inde¬ 
pendent. 

16. Let V be a vector space and F : V-> R a linear map. Let W be the subset of 

V consisting of all elements v such that F(v) = 0. Assume that W ^ V, and 
let v 0 be an element of V which does not lie in W. Show that every element 
of V can be written as a sum w + cv 0 , with some w in W and some number 
c. 

17. In Exercise 16, show that W is a subspace of V. Let {v u ...,v n } be a basis of 
W. Show that {v 0 ,v 1 ,...,v„} is a basis of V. 

18. Let L: R 2 -> R 2 be a linear map, having the following effect on the indicated 
vectors: 

(a) L(3, 1) = (1, 2) and L(-l, 0) = (1, 1) 

(b) L(4, 1) = (1, 1) and L(l, 1) = (3, -2) 

(c) L(l, 1) = (2, 1) and L(-l, 1) = (6, 3). 

In each case compute L(l, 0). 


19. Let L be as in (a), (b), (c), of Exercise 18. Find L(0, 1). 
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III, §3. THE KERNEL AND IMAGE OF A LINEAR MAP 

Let V, W be vector spaces over X, and let F: V-+ W be a linear map. 
We define the kernel of F to be the set of elements veV such that 
F(v) = 0. 


We denote the kernel of F by Ker F. 

Example 1. Let L: R 3 -+ R be the map such that 

L(x, y, z) = 3x — 2y + z. 

Thus if A = (3, — 2, 1), then we can write 

L(X ) = X • A = AX. 

Then the kernel of L is the set of solutions of the equation 

3x — 2y + z = 0. 

Of course, this generalizes to n-space. If A is an arbitrary vector in R", 
we can define the linear map 


L a : R"-R 

such that L a (X) = A X. Its kernel can be interpreted as the set of all X 
which are perpendicular to A. 

Example 2. Let P:R 3 -*R 2 be the projection, such that 

P(x, y, z) = (x, y). 

Then P is a linear map whose kernel consists of all vectors in R 3 whose 
first two coordinates are equal to 0, i.e. all vectors 

(0, 0, z) 

with arbitrary component z. 

We shall now prove that the kernel of a linear map F: V W is a 
subspace of V. Since F(0) = 0, we see that O is in the kernel. Let v, w 
be in the kernel. Then F(v + w) = F(v) -I- F(w) = 0 + 0 = 0, so that 
v + w is in the kernel. If c is a number, then F(cv) = cF(v ) = O so that 
cv is also in the kernel. Hence the kernel is a subspace. 
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The kernel of a linear map is useful to determine when the map is in¬ 
jective. Namely, let F: V -* W be a linear map. We contend that follow¬ 
ing two conditions are equivalent: 

1. The kernel of F is equal to {0}. 

2. If v , w are elements of V such that F(v) = F(w\ then v = w. In other 
words , F is injective. 

To prove our contention, assume first that Ker F = {O}, and suppose 
that v , w are such that F(v) = F(w). Then 

F(v - w) = F(v) - F(w) = O. 

By assumption, v — w = 0, and hence v = w. 

Conversely, assume that F is injective. If v is such that 


F(v) = F(0) = O, 


we conclude that v = O. 

The kernel of F is also useful to describe the set of all elements of V 
which have a given image in W under F. We refer the reader to Exercise 
4 for this. 

Theorem 3.1. Let F:V-*W be a linear map whose kernel is {0}. If 
p l5 ...,t; w are linearly independent elements of V, then F(v 1 ),...,F(v n ) are 
linearly independent elements of W. 

Proof Let x l5 ...,x n be numbers such that 


+ ••• + x„F( v„) = 0. 


By linearity, we get 


F(x l v l + ••• + x n v n ) = O. 

Hence x l v l + ••• + x n v n = O. Since v l9 ...,v n are linearly independent, it 
follows that Xi = 0 for i = 1 ,...,n. This proves our theorem. 

Let F: V -* W be a linear map. The image of F is the set of elements 
w in W such that there exists an element of v of V such that F(v) = w. 

The image of F is a subspace of W. 
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To prove this, observe first that F(0) = O , and hence O is in the im¬ 
age. Next, suppose that w l5 w 2 are in the image. Then there exist ele¬ 
ments v u v 2 of V such that F(v x ) = w t and F(v 2 ) = w 2 . Hence 

F(v i + v 2 ) = F(v x ) + F(v 2 ) = Wj + w 2 , 
thereby proving that w 1 + w 2 is in the image. If c is a number, then 

F(cVi) = cF(Vi) = cn^. 


Hence cu^ is in the image. This proves that the image is a subspace of 
W. 


We denote the image of F by Im F. 

The next theorem relates the dimensions of the kernel and image of a 
linear map with the dimension of the space on which the map is defined. 

Theorem 3.2. Let V be a vector space. Let L: V W be a linear map 
of V into another space W. Let n be the dimension of V, q the dimen¬ 
sion of the kernel of L, and s the dimension of the image of L. Then 
n = q + s. In other words , 

dim V = dim Ker L + dim Im L. 

Proof. If the image of L consists of 0 only, then our assertion is triv¬ 
ial. We may therefore assume that s > 0. Let {w 1 ,...,w s } be a basis of 
the image of L. Let v l9 ... 9 v s be elements of V such that L^) = w f for 
i = l,...,s. If the kernel of L is not {0}, let {u u ...,u q } be a basis of the 
kernel. If the kernel is {0}, it is understood that all reference to 
{u l9 ...,u q } is to be omitted in what follows. We contend that 
{v l9 ... 9 v s ,u u ...,u q } is a basis of V. This will suffice to prove our asser¬ 
tion. Let v be any element of V. Then there exist numbers x l5 ...,x s such 
that 


L(v) = x i w l + ••■ + x s w s , 


because {w 1 ,...,w s } is a basis of the image of L. By linearity, 


L(v) = L(x 1 v 1 + ••• + x s v s ), 


and again by linearity, subtracting the right-hand side from the left-hand 
side, it follows that 
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Hence v — x 1 v 1 — • • • — x s v s lies in the kernel of L, and there exist 
numbers y l9 ... 9 y q such that 

v - x x v x - x s v s = y l u l + • • • + y q u q . 

Hence 

v = x x v x + • • • + x s v s + y l u 1 + • • • + y q u q 

is a linear combination of v l 9 ... 9 v s 9 u l 9 ... 9 u q . This proves that these 
s + q elements of V generate V. 

We now show that they are linearly independent, and hence that they 
constitute a basis. Suppose that there exists a linear relation: 


x 1 v 1 + • • • + x s v s + y 1 u 1 + • • • + y q u q = O. 


Applying L to this relation, and using the fact that L(uj) = O for 
j = 1 9 ... 9 q 9 we obtain 


+ •• + x s L(v s ) = O. 

But L(v 1 ),...,L(v s ) are none other than w l9 ...,w s , which have been as¬ 
sumed linearly independent. Hence x t = 0 for i = l,...,s. Hence 


y i u i + ••• + y q u q = o. 


But u l9 ... 9 u q constitute a basis of the kernel of L, and in particular, are 
linearly independent. Hence all y } = 0 for j = This concludes the 

proof of our assertion. 

Example 1 (Cont.). The linear map L: R 3 -► R of Example 1 is given 
by the formula 


L(x, y, z) = 3x — 2y + z. 


Its kernel consists of all solutions of the equation 


3x — 2y + z = 0. 

Its image is a subspace of R, is not {0}, and hence consists of all of R. 
Thus its image has dimension 1. Hence its kernel has dimension 2. 

Example 2 (Cont.). The projection P:R 3 ->R 2 of Example 2 is ob¬ 
viously surjective, and its kernel has dimension 1. 

In Chapter V, §3 we shall investigate in general the dimension of the 
space of solutions of a system of homogeneous linear equations. 
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Theorem 3.3. Let L: V —► W be a linear map. Assume that 


dim V = dim W. 


If Ker L = {0}, or if Im L = W, then L is bijective. 


Proof Suppose Ker L = {O}. By the formula of Theorem 3.2 we con¬ 
clude that dim Im L = dim W. By Corollary 3.5 of Chapter I it follows 
that L is surjective. But L is also injective since Ker L = {O}. Hence L 
is bijective as was to be shown. The proof that Im L = W implies L bi¬ 
jective is similar and is left to the reader. 


Ill, §3. EXERCISES 

1. Let A, B be two vectors in R 2 forming a basis of R 2 . Let F: R 2 R" be a 
linear map. Show that either F(A), F(B) are linearly independent, or the im¬ 
age of F has dimension 1, or the image of F is {0}. 

2. Let A be a non-zero vector in R 2 . Let F: R 2 -► W be a linear map such that 
F(A) = O. Show that the image of F is either a straight line or {0}. 

3. Determine the dimension of the subspace of R 4 consisting of all IeR 4 such 
that 


Xj + 2x 2 = 0 and x 3 — 15x 4 = 0. 

4. Let L: V-> W be a linear map. Let w be an element of W. Let v 0 be an ele¬ 
ment of V such that L(v 0 ) = w. Show that any solution of the equation 
L(X) = w is of type v 0 + w, where u is an element of the kernel of L. 

5. Let V be the vector space of functions which have derivatives of all orders, 
and let D: V-> V be the derivative. What is the kernel of D? 

6. Let D 2 be the second derivative (i.e. the iteration of D taken twice). What is 
the kernel of D 2 ? In general, what is the kernel of D n (n -th derivative)? 

7. Let V be again the vector space of functions which have derivatives of all 
orders. Let W be the subspace of V consisting of those functions / such that 


f" + 4/ = 0 and f(n) = 0. 


Determine the dimension of W. 

8. Let V be the vector space of all infinitely differentiable functions. We write 
the functions as functions of a variable t , and let D = d/dt. Let a l9 ...,a m be 
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numbers. Let g be an element of V. Describe how the problem of finding a 
solution of the differential equation 

d m f d m ~ l f 

a "dr +a ’ n ~ 1 dr rT + '" + a ° f ~ 0 

can be interpreted as fitting the abstract situation described in Exercise 4. 

9. Again let V be the space of all infinitely differentiable functions, and let 
D: V-> V be the derivative. 

(a) Let L = D — I where / is the identity mapping. What is the kernel of L? 

(b) Same question if L = D — al, where a is a number. 

10. (a) What is the dimensison of the subspace of K n consisting of those vectors 
A — ( a l9 ... ,a„) such that a 1 + ■ • • + a n = 0? 

(b) What is the dimension of the subspace of the space of n x n matrices (a {j ) 
such that 

n 

a \ 1 + ' “ + a nn = Z a H = 

i — 1 


[For part (b), look at the next exercise.] 

11. Let A = (a fj ) be an n x n matrix. Define the trace of A to be the sum of the 
diagonal elements, that is 

tr (A) = t a u . 

i = 1 

(a) Show that the trace is a linear map of the space of n x n matrices into 
K. 

(b) If A, B are n x n matrices, show that tr (AB) = tr(BA). 

(c) If B is invertible, show that tr (B l AB) = tr(A). 

(d) If A, B are n x n matrices, show that the association 

(A , B) h-> tr (AB) = <A, B > 

satisfies the three conditions of a scalar product. (For the general defini¬ 
tion, cf. Chapter Y.) 

(e) Prove that there are no matrices A , B such that 

AB — BA = 

12. Let S be the set of symmetric n x n matrices. Show that S is a vector space. 
What is the dimension of S? Exhibit a basis for S, when n = 2 and n = 3. 

13. Let A be a real symmetric n x n matrix. Show that 

tr(AA) ^ 0, 


and if A ^ 0, then tr(AA) > 0. 
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14. An n x n matrix A is called skew-symmetric if *A = — A. Show that any 
n x n matrix A can be written as a sum 


A = B + C, 


where B is symmetric and C is skew-symmetric. [Hint: Let B = (A + t A)/ 2.]. 
Show that if A = B 1 + C 1? where is symmetric and C 1 is skew-symmetric, 
then B = B 1 and C = C v 

15. Let M be the space of all n x n matrices. Let 


be the map such that 


P:M 


M 


P(A) = 


A+ l A 

_ 


(a) Show that P is linear. 

(b) Show that the kernel of P consists of the space of skew-symmetric ma¬ 
trices. 

(c) What is the dimension of the kernel of P? 

16. Let M be the space of all n x n matrices. Let 


F.M^M 


be the map such that 


F(A) = 


A — X A 


2 


(a) Show that F is linear. 

(b) Describe the kernel of P, and determine its dimension. 

17. (a) Let U , W be the vector spaces. We let U x W be the set of all pairs 
(m, w) with ueU and weW. If (w^Wj), ( u 2 ,w 2 ) are suc ^ pairs, define 
their sum 

(Ml, Wj) + (W 2 , W 2 ) = (Wi + W 2 , Wi + W 2 ). 

If c is a number, define c(w, w) = ( cu , cw). Show that U x W is a vector 

space with these definitions. What is the zero element? 

(b) If U has dimension n and W has dimension m, what is the dimensison of 

U x W1 Exhibit a basis of U x W in terms of a basis for U and a basis 

for W. 

(c) If U is a subspace of a vector space V 7 show that the subset of V x V 
consisting of all elements (u, u) with ueU is a subspace. 
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18. (To be done after you have done Exercise 17.) Let U , W be subspaces of a 
vector space V. Show that 

dim U + dim W = dim(G + W) + dim((7 n W). 

[Hint: Show that the map 

L:U x K 

given by 

L(u, w) = u — w 

is a linear map. What is its image? What is its kernel?] 

Ill, §4. COMPOSITION AND INVERSE OF LINEAR 
MAPPINGS 

In §1 we have mentioned the fact that we can compose arbitrary maps. 
We can say something additional in the case of linear maps. 

Theorem 4.1. Let U, V, W be vector spaces over a field K. Let 

F: U -> V and G:V^W 

be linear maps. Then the composite map G°F is also a linear map. 

Proof. This is very easy to prove. Let u , v be elements of U. Since F 
is linear, we have F(u + v) = F(u) + F(v). Hence 

(G o F)(u + v) = G(F(u + v)) = G(F(u) + F(vj). 

Since G is linear, we obtain 

G(F(u) + F(v)) = G(F(u )) + G(F(v)) 

Hence 

(G o F)(u + v) = (G o F)(u) 4- (G ° F)(v). 

Next, let c be a number. Then 

(G o F)(cu) = G(F(cu)) 

= G(cF(u)) (because F is linear) 

= cG(F(u)) (because G is linear). 

This proves that G°F is a linear mapping. 
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The next theorem states that some of the rules of arithmetic concern¬ 
ing the product and sum of numbers also apply to the composition and 
sum of linear mappings. 

Theorem 4.2. Let U, V , W be vector spaces over a field K. Let 

F : U -+ V 

be a linear mapping , and let G, H be two linear mappings of V into W. 
Then 


(i G + H)o F = GoF + HoF. 


If c is a number , then 


(cG) o F = c(G°F). 

If T: U -* V is a linear mapping from U into V , then 

G ° (F + T) = G° F + Go T. 

The proofs are all simple. We shall just prove the first assertion and 
leave the others as exercises. 

Let u be an element of U. We have: 

((G + H)oF)(u) m (G + H)(F(uj) = G(F(u)) + H(F(u)) 

= {Go F)(u) + (H o F)(u). 


By definition, it follows that (G + H)o F = G ° F + H ° F . 

It may happen that U = V = W. Let F: U -*U and G: U -+ U be two 
linear mappings. Then we may form F ° G and GoF. It is not always 
true that these two composite mappings are equal. As an example, let 
U = R 3 . Let F be the linear mapping given by 

F(x, y, z) = (x, y, 0) 

and let G be the linear mapping given by 


Then 


G(x, y, z ) = (x, z, 0). 

(G o F)(x, y, z) = (x, 0, 0), 


but 


(F o G)(x, y, z) = (x, z, 0). 
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Let F: V -► V be a linear map of a vector space into itself. One some¬ 
times calls F an operator. Then we can form the composite F°F, which 
is again a linear map of V into itself. Similarly, we can form the compo¬ 
site 


F o F ° • • • ° F 

of F with itself n times for any integer n ^ 1. We shall denote this com¬ 
posite by F n . If n = 0, we define F° = I (identity map). We have the 
rules 


fr + s 


= F r °F s 


for integers r, s ^ 0. 

Theorem 4.3. Let F:U -> V be a linear map, and assume that this map 
has an inverse mapping G:V->U. Then G is a linear map. 

Proof. Let v l9 v 2 eV. We must first show that 

G(v i + v 2 ) = G(v x ) 4- G(v 2 ). 

Let = G(v t ) and u 2 = G(v 2 ). By definition, this means that 

F(u x ) = v 1 and F(u 2 ) = v 2 . 

Since F is linear, we find that 

F(u 1 + u 2 ) = F(u 0 4- F(u 2 ) = v t + v 2 . 

By definition of the inverse map, this means that G(v x 4- v 2 ) = u 1 4- u 2 , 
thus proving what we wanted. We leave the proof that G(cv) = cG(v) as 
an exercise (Exercise 3). 

Corollary 4.4. Let F: U -► V be a linear map whose kernel is {O}, and 
which is surjective. Then F has an inverse linear map. 

Proof. We had seen in §3 that if the kernel of F is {O}, then F is 
injective. Hence we conclude that F is both injective and surjective, so 
that an inverse mapping exists, and is linear by Theorem 4.3. 

Example 1. Let F: R 2 -► R 2 be the linear map such that 


F(x, y ) = (3x - y, 4x 4- 2 y). 
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We wish to show that F has an inverse. First note that the kernel of F 
is {0}, because if 

3x — y = 0, 

4x + 2y = 0, 


then we can solve for x, y in the usual way: Multiply the first equation 
by 2 and add it to the second. We find lOx = 0, whence x = 0, and then 
y = 0 because y = 3x. Hence F is injective, because its kernel is {O}. By 
Theorem 3.2 it follows that the image of F has dimension 2. But the im¬ 
age of F is a subspace of R 2 , which has also dimension 2, and hence this 
image is equal to all of R 2 , so that F is surjective. Hence F has an in¬ 
verse, and this inverse is a linear map by Theorem 4.3. 

A linear map F:U -> V which has an inverse G: V-> U (we also say 

invertible) is called an isomorphism. 

Example 2. Let V be a vector space of dimension n. Let 

{»!.•••»»«} 


be a basis for V. Let 


L: R" 


V 


be the map such that 

L(xi,...,x n ) = x l v l + ••■ + x n v n . 

Then L is an isomorphism. 

Proof. The kernel of L is {O}, because if 

x x v x H-+ x n v n = O , 

then all x t = 0 (since v l9 ... 9 v H are linearly independent). The image of L 
is all of V 9 because v l9 ...,v n generate V. By Corollary 4.4, it follows that 
L is an isomorphism. 

Remark on notation. Let 

F: V-> V and G: V 

be linear maps of a vector space into itself. We often, and even usually, 
write 


FG instead of F°G. 
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In other words, we omit the little circle o between F and G. The distri¬ 
butive law then reads as with numbers 

F(G + H) = FG + FH. 

The only thing to watch out for is that F, G may not commute, that is 
usually 


FG # GF. 

If F and G commute, then you can work with the arithmetic of linear 
maps just as with the arithmetic of numbers. 

Powers /, F, F 2 , F 3 ,... do commute with each other. 


Ill, §4. EXERCISES 

1. Let L: R 2 -> R 2 be a linear map such that L ^ O but L 2 = L°L = O. Show 
that there exists a basis {A, B} of R 2 such that 

L(A) = B and L(B) = O. 

2. Let dim V > dim W. Let L: V -> W be a linear map. Show that the kernel of 
L is not {O}. 

3. Finish the proof of Theorem 4.3. 

4. Let dim V — dim W. Let L: V-> W be a linear map whose kernel is {O}. 
Show that L has an inverse linear map. 

5. Let F, G be invertible linear maps of a vector space V onto itself. Show that 


(FoG)" 1 = G _1 °F -1 . 

6. Let L: R 2 -> R 2 be the linear map defined by 

L(x, y) = (x + y, x - y). 


Show that L is invertible. 

7. Let L: R 2 -> R 2 be the linear map defined by 

L(x, y ) = (2x + y, 3x — 5y). 


Show that L is invertible. 

8. Let L: R 3 -► R 3 be the linear maps as indicated. Show that L is invertible in 
each case. 

(a) L(x , y, z) = (x — y, x + z, x + y + 2z) 

(b) L(x, y, z) = (2x — y + z, x + y, 3x + y + z) 
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9. (a) Let L: V-* V be a linear mapping such that L 2 = O. Show that I — L is 
invertible. (I is the identity mapping on V.) 

(b) Let L: V-> V be a linear map such that L 2 + 2L + I = O. Show that L is 
invertible. 

(c) Let L: V -+ V be a linear map such that L 3 = O. Show that I — L is in¬ 
vertible. 

10. Let V be a vector space. Let P: V be a linear map such that P 2 = P. 

Show that 


V = Ker P + Im P and Ker P n Im P = {O}, 

in other words, V is the direct sum of Ker P and Im P. [Hint: To show V is 
the sum, write an element of V in the form v = v — Pv + Pi;.] 

11. Let V be a vector space, and let P, Q be linear maps of V into itself. Assume 
that they satisfy the following conditions: 

(a) P + Q = I (identity mapping). 

(b) PQ = QP = O. 

(c) P 2 = P and Q 2 = Q. 

Show that V is equal to the direct sum of Im P and Im Q. 

12. Notations being as in Exercise 11, show that the image of P is equal to the 
kernel of Q. [Prove the two statements: 

Image of P is contained in kernel of Q , 

Kernel of Q is contained in image of P.] 

13. Let T: V-> V be a linear map such that T 2 = /. Let 

P = i(J + T) and Q = i(I- T). 

Prove: 

P + Q = I; P 2 = P; Q 2 = Q; PQ = QP = O. 

14. Let F:V^W and G: W-+U be isomorphisms of vector spaces over K. 
Show that G°F is invertible, and that 

(GoF)" 1 = P _1 oG _1 . 

15. Let F.V^W and G: W -> U be isomorphisms of vector spaces over K. 

Show that G°F: (7 is an isomorphism. 

16. Let V, W be two vector spaces over K, of finite dimension n. Show that V 
and W are isomorphic. 

17. Let A be a linear map of a vector space into itself, and assume that 

A 2 -A + 1 = 0 

(where I is the identity map). Show that A~ l exists and is equal to I — A. 
Generalize (cf. Exercise 37 of Chapter II, §3). 
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18. Let A, B be linear maps of a vector space into itself. Assume that AB = BA. 
Show that 

(A + B) 2 = A 2 + 2 AB + B 2 

and 

(A + B)(T — B) = A 2 — B 2 . 

19. Let A, B be linear maps of a vector space into itself. If the kernel of A is 
{0} and the kernel of B is {0}, show that the kernel of AB is also {0}. 

20. More generally, let A: V-> W and B: W^U be linear maps. Assume that the 
kernel of A is {0} and the kernel of B is {0}. Show that the kernel of BA is 
{O}. 

21. Let A: V-> W and B.W^JJ be linear maps. Assume that A is surjective and 
that B is surjective. Show that BA is surjective 

III, §5. GEOMETRIC APPLICATIONS 

Let V be a vector space and let v, u be elements of V. We define the line 
segment between v and v + u to be the set of all points 

v 4- tu, 

This line segment is illustrated in the following figure. 



v+u 


For instance, if t = j, then v + \u is the point midway between v and 
v + u. Similarly, if t = j, then v + jw is the point one third of the way 
between v and v + u (Fig. 3). 
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If v, w are elements of V, let u = w — v. Then the line segment be¬ 
tween v and w is the set of all points v + tu , or 

v + t(w — v), 0 ^ ^ 1. 



Observe that we can rewrite the expression for these points in the form 

(1) (1 — t)v + £w, 0 ^ t ^ 1, 

and letting s = 1 — £, t = 1 — s, we can also write it as 

sv -h (1 — s)w, 0 ^ s ^ 1. 

Finally, we can write the points of our line segment in the form 

(2) t 1 i; + t 2 w 

with t l9 and t t + t 2 = 1. Indeed, letting t = t 2 , we see that every 

point which can be written in the form (2) satisfies (1). Conversely, we 
let t 1 = 1 — t and t 2 = t and see that every point of the form (1) can be 
written in the form (2). 

Let L\V->V be a linear map. Let S be the line segment in V be¬ 
tween two points v , w. Then the image L(S ) of this line segment is the 
line segment in V' between the points L(v ) and L(w). This is obvious 
from (2) because 


L(t x v + t 2 w) = t l L(v ) + t 2 L(w). 

We shall now generalize this discussion to higher dimensional figures. 
Let v, w be linearly independent elements of the vector space V. We 
define the parallelogram spanned by v, w to be the set of all points 

t x v + t 2 w, for i = 1, 2. 

This definition is clearly justified since t t v is a point of the segment be¬ 
tween O and v (Fig. 5), and t 2 w is a point of the segment between O and 
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w. For all values of t u t 2 ranging independently between 0 and 1, we see 
geometrically that t x v + t 2 w describes all points of the parallelogram. 



At the end of §1 we defined translations. We obtain the most general 
parallelogram (Fig. 6) by taking the translation of the parallelogram just 
described. Thus if u is an element of V, the translation by u of the paral¬ 
lelogram spanned by v and w consists of all points 

u + t^v + t 2 w, 0 ^ ti S 1 for i — 2. 



As with line segments, we see that if L: V-> V' is a linear map, then 
the image under L of a parallelogram is a parallelogram (if it is not de¬ 
generate), because it is the set of points 

L(u + t x v + t 2 w) = L(u) + t x L(v) + t 2 L(w ) 

with 

0 ^ t x <> 1 for i = 1, 2. 
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We shall now describe triangles. We begin with triangles located at 
the origin. Let v, w again be linearly independent. We define the triangle 
spanned by O, v, w to be the set of all points 

(3) t t v + t 2 w, 0 ^ £ f and t t + *2^1- 

We must convince ourselves that this is a reasonable definition. We do 
this by showing that the triangle defined above coincides with the set of 
points on all line segments between v and all the points of the segment 
between O and w. From Fig. 7, this second description of a triangle 
does coincide with our geometric intuition. 



We denote the line segment between O and w by Ow. A point on Ow 
can then be written £w with 0 ^ ^ 1. The set of points between v and 
£w is the set of points 

(4) sv + (1 — s)£w, 0 ^ s ^ 1. 

Let t t = s and £ 2 = (1 — s)£. Then 


t i + £ 2 = s + (1 — s )£ = s + (I — s ) = 1- 

Hence all points satisfying (4) also satisfy (3). Conversely, suppose given 
a point t x v + £ 2 w satisfying (3), so that 

£i + £ 2 ^ 1. 

Then £ 2 ^ 1 — t v If t 1 = 1 then £ 2 = 0 and we are done. If £ x < 1, then 
we let 


Then 


s = t 1, t = t 2 /( 1 - it). 


t x v + £ 2 w = t x v + (1 — £ t ) 


^2 

( 1-0 


W = SP + (1 — s)£w, 
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which shows that every point satisfying (3) also satisfies (4). This justifies 
our definition of a triangle. 

As with parallelograms, an arbitrary triangle is obtained by translating 
a triangle located at the origin. In fact, we have the following descrip¬ 
tion of a triangle. 

Let v l9 v 2 , v 3 be elements of V such that — v 3 and v 2 — v 3 are lin¬ 
early independent. Let v = v t — v 3 and w = v 2 — v 3 . Let S be the set 
of points 

(5) t 1 v 1 + t 2 v 2 + t 3 v 3 , 0 ^ t t for i = 1, 2, 3, 

+ h + h = I* 


Then S is the translation by v 3 of the triangle spanned by O, v, w. (Cf. 
Fig. 8.) 



Figure 8 

Proof Let P = t 1 v 1 + t 2 v 2 + t 3 v 3 be a point satisfying (5). Then 
P = *iOi - V 3 ) + t 2 (v 2 - V 3 ) + t 1 v 3 + t 2 v 3 + t 3 v 3 


= t x v + t 2 w + v 3 , 


and t t + Hence our point P is a translation by v 3 of a point sat¬ 

isfying (3). Conversely, given a point satisfying (3), which we translate by 
v 3 , we let t 3 = 1 — t 2 ~ t l9 and we can then reverse the steps we have 
just taken to see that 

t t v + t 2 w + v 3 = t x v x + t 2 v 2 + t 3 P 3 . 

This proves what we wanted. 

Actually, it is (5) which is the most useful description of a triangle, be¬ 
cause the vertices v l9 v 2 , v 3 occupy a symmetric position in this defini¬ 
tion. 
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One of the advantages of giving the definition of a triangle as we did 
is that it is then easy to see what happens to a triangle under a linear 
map. Let L: V-> W be a linear map, and let v, w be elements of V which 
are linearly independent. Assume that L(v ) and L(w) are also linearly in¬ 
dependent. Let S be the triangle spanned by O, v, w. Then the image of 
S under L, namely L(S), is the triangle spanned by O, L(v\ L(w). In¬ 
deed, it is the set of all points 


with 


L(t x v + t 2 w) = t l L(v) + t 2 L(w) 
0 ^ t t and £i + t 2 ^ 1. 


Similarly, let S be the triangle spanned by v l9 v 2 , v 3 . Then the image 
of S under L is the triangle spanned by L(v t ) 9 L(v 2 \ L(v 3 ) (if these do 
not lie on a straight line) because it consists of the set of points 


L(t x v i + t 2 v 2 + t 3 v 3 ) = + t 2 L(v 2 ) + t 3 L(v 3 ) 


with 0 ^ t ( and t x + £ 2 + t 3 = 1 . 


The conditions of (5) are those which generalize to the fruitful con¬ 
cept of convex set which we now discuss. 

Let S' be a subset of a vector space V. We shall say that S is convex if 
given points P, Q in S the line segment between P and Q is contained in 
S. In Fig. 9, the set on the left is convex. The set on the right is not 
convex since the line segment between P and Q is not entirely contained 
in S. 


Convex set 



Figure 9 


Theorem 5.1. Let P l9 ...,P„ be elements of a vector space V. Let S be 
the set of all linear combinations 



t\P 1 + ••• + £„P„ 


with 0 ^ t t and t t + ••• + t n = 1. Then S is convex. 
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Proof. Let 


P — t x P j H-+ t n P 


n 


and 


Q — s i p i + ••• + s n P n 


with 0 ^ t h 0 ^ s h and 


t\ 4-+ K — I, 

s i + + s n = 1. 


Let 0 ^ t ^ 1. Then: 

(1 - t)P + te 

= (1 — 0*1^1 + •■• + (1 — t)t n P n + tSiPi H-+ tS n P n 

= [(1 - t)t 1 + H-+ C(1 — t)t„ + ts„]P„. 


We have 0 ^ (1 — t)t t + for all i, and 

(1 — t)t l -\- ts l + ••• + (1 — t)t n + ts n 

= (l-t)(t 1 + --- + t w ) + t(s 1 +••• + *„) 

= (i -*> + * 

= 1. 

This proves our theorem. 

From Theorem 5.1, we see that a triangle, as we have defined it ana¬ 
lytically, is convex. The convex set of Theorem 5.1 is therefore a natural 
generalization of a triangle (Fig. 10). 



Figure 10 
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We shall call the convex set of Theorem 5.1 the convex set spanned by 
P x ,...,P n . Although we shall not need the next result, it shows that this 
convex set is the smallest convex set containing all the points P l ,...,P n . 

Theorem 5.2. Let P u ...,P n be points of a vector space V. Any convex 
set S' which contains P l9 ...,P n also contains all linear combinations 


t 1 P 1 + ••• + t n P n 


with 0 ^ t t for all i and t 1 + • • • + t n = 1. 

Proof. We prove this by induction. If n = 1, then t 1 = 1, and our as¬ 
sertion is obvious. Assume the theorem proved for some integer n — 1^1. 
We shall prove it for n. Let t l9 ...,t n be numbers satisfying the condi¬ 
tions of the theorem. If t n = 1, then our assertion is trivial because 

h = “ ’ = t n- i = 0 - 

Suppose that t n ^l. Then the linear combination t 1 P 1 + • • • + t n P n is 
equal to 

Let 

S( = .—-— for i = 1 ,.. . ,n — 1 . 

1 ~t n 

Then s t ^ 0 and s t H-+ t = 1 so that by induction, we conclude 

that the point 


Q = 5l^l + ••• + Sn-tPn- 1 


lies in S'. But then 


(1 “ 02 + t n P n ~ t lP l+-'* + t n P l 


lies in S' be definition of a convex set, as was to be shown. 

Example. Let V be a vector space, and let L: V-> R be a linear map. 
We contend that the set S of all elements v in V such that L(v) < 0 is 
convex. 

Proof. Let L(v) < 0 and L(w) < 0. Let 0 < t < 1. Then 


L(tv + (1 - t)w) = tL(v) + (1 - t)L(w). 
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Then tL(v ) < 0 and (1 — t)L(w) < 0 so tL(v) + (1 — t)L(w) < 0, whence 
tv + (1 — t)w lies in S. If t = 0 or t = 1, then tv + (1 — t)w is equal to v 
or w and thus also lies in S. This proves our assertion. 

For a generalization of this example, see Exercise 6. 

For deeper theorems about convex sets, see the last chapter. 


Ill, §5. EXERCISES 

1. Show that the image under a linear map of a convex set is convex. 

2. Let and S 2 be convex sets in V. Show that the intersection S 1 nS 2 is con¬ 
vex. 

3. Let L: R" R be a linear map. Let S be the set of all points A in R" such 
that L(A) ^ 0. Show that S is convex. 

4. Let L: R" -» R be a linear map and c a number. Show that the set S consist¬ 
ing of all points A in R" such that L(A) > c is convex. 

5. Let A be a non-zero vector in R" and c a number. Show that the set of 
points X such that X • A ^ c is convex. 

6. Let L: V -* W be a linear map. Let S' be a convex set in W. Let S be the set 
of all elements P in V such that L(P) is in S'. Show that S is convex. 

Remark. If you fumbled around with notation in Exercises 3, 4, 5 then show 
why these exercises are special cases of Exercise 6, which gives the general princi¬ 
ple behind them. The set S in Exercise 6 is called the inverse image of S' under 
L. 

7. Show that a parallelogram is convex. 

8. Let S be a convex set in V and let u be an element of V. Let T u : V-+ V be 
the translation by u. Show that the image T U (S ) is convex. 

9. Let S be a convex set in the vector space V and let c be a number. Let cS 
denote the set of all elements cv with v in S. Show that cS is convex. 

10. Let u, w be linearly independent elements of a vector space V Let F:V->W 
be a linear map. Assume that F(v\ F(w) are linearly dependent. Show that 
the image under F of the parallelogram spanned by v and w is either a point 
or a line segment. 



CHAPTER IV 


Linear Maps and Matrices 


IV, §1. THE LINEAR MAP ASSOCIATED WITH A MATRIX 


be an m x n matrix. We can then associate with A a map 


by letting 


L a \ K n —► K n 


LAX ) = AX 


for every column vector X in K n . Thus L A is defined by the association 
X i—► AX, the product being the product of matrices. That L A is linear is 
simply a special case of Theorem 3.1, Chapter II, namely the theorem 
concerning properties of multiplication of matrices. Indeed, we have 

A(X+ Y) = AX + AY and A(cX) = cAX 

for all vectors X, Y in K n and all numbers c. We call L A the linear map 
associated with the matrix A. 


Example. If 


and X = 
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then 


L a (X) = 





Theorem 1.1. If A, B are m x n matrices and if L A = L B , then A = B. 
In other words , if matrices A , B give rise to the same linear map , then 
they are equal. 


Proof By definition, we have A t • X = B r X for all i, if A t is the i-th 
row of A and B t is the i-th row of B. Hence (A t — £*) • X = 0 for all i 
and all X. Hence A { — B t = O, and A t = B t for all i. Hence A = B. 


We can give a new interpretation for a system of homogeneous linear 
equations in terms of the linear map associated with a matrix. Indeed, 
such a system can be written 


AX = O, 

and hence we see that the set of solutions is the kernel of the linear map 

l a . 


IV, §1. EXERCISES 


1. In each case, find the vector L A (X). 



IV, §2. THE MATRIX ASSOCIATED WITH A LINEAR MAP 

We first consider a special case. 

Let 

L:K n ^K 

be a linear map. There exists a unique vector A in K n such that 
L — L a , i.e. such that for all X we have 


L(X) = AX. 
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Let E u ...,E n be the unit vectors in K n . If X = x 1 E l + ••• + x n E n is any 
vector, then 


L(X) = L(x l E l + ••• + x n E n ) 

= xM^i) + • + * n L(E n ). 


If we now let 


= L(E^ 

we see that 

L{X ) = x±a± + • • • + x n a, n = X • A. 

This proves what we wanted. It also gives us an explicit determination 
of the vector A such that L = L A , namely the components of A are pre¬ 
cisely the values L(E t \... ,L(E n \ where E t (i = 1 are the unit vec¬ 

tors of K n . 

We shall now generalize this to the case of an arbitrary linear map 
into K m , not just into K. 

Theorem 2.1. Let L : K n — ► K m be a linear map. Then there exists a 
unique matrix A such that L = L A . 

Proof. As usual, let E 1 ,...,£" be the unit column vectors in K n , and let 
e l 9 ...,e m be the unit column vectors in K m . We can write any vector X 
in K n as a linear combination 


X = x^E 1 + ••• + x n E n = 


where Xj is the j -th component of X. We view as column vec¬ 

tors. By linearity, we find that 

L(X) = x 1 L(£ 1 )+ 

and we can write each L(E j ) in terms of e l ,...,e m . In other words, there 
exist numbers a tj such that 

L(E 1 ) = a xl e l + ••• + a ml e m 
L(E n ) = a ln e l +■■■ + a mn e m 
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or in terms of the column vectors, 


(*) 



Hence 


L(X ) — x 1 (a 1 x e x 4- • • • + a ml e m ) + • • • + x n (a ln e 1 + • • • + a mn e m ) 
= (flj iXi + • • • + a ln x n )e l + • • • + (a ml x l + • • • + a mn x n )e m . 

Consequently, if we let A = (a^), then we see that 


L(X) = AX. 


Written out in full, this reads 



Thus L = L a is the linear map associated with the matrix A. We also 
call A the matrix associated with the linear map L. We know that this 
matrix is uniquely determined by Theorem 1.1. 


Example 1. Let F:R 3 ->R 2 be the projection, in other words the 
mapping such that F(x u x 2 , x 3 ) = (x l9 x 2 )- Then the matrix associated 
with F is 

1 0 0 \ 

0 1 0 / 

Example 2. Let J: R”^R” be the identity. Then the matrix associated 
with / is the matrix 



0 0 
1 0 


k0 0 0 • h 


having components equal to 1 on the diagonal, and 0 otherwise. 
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Example 3. According to Theorem 2.1 of Chapter III, there exists a 
unique linear map L: R 4 — ► R 2 such that 

L(£ 1 ) = Q, L(£ 2 ) = ^_^, L(£ 3 ) = (^, L(£ 4 ) = Q- 

According to the relations (*), we see that the matrix associated with L 
is the matrix 

2 3 -5 1\ 

1 -1 4 7 ) 

Example 4 (Rotations). We can define a rotation in terms of matrices. 
Indeed, we call a linear map L: R 2 -> R 2 a rotation if its associated ma¬ 
trix can be written in the form 


m = 


( cos 6 
sin 6 


— sin 6 \ 
cos 6 J 


The geometric justification for this definition comes from Fig. 1. 



Figure 1 


We see that 


L(£ x ) = (cos 0)E 1 + (sin 0)E 2 , 

L(E 2 ) = (-sin 6)E 1 + (cos 6)E 2 . 

Thus our definition corresponds precisely to the picture. When the ma¬ 
trix of the rotation is as above, we say that the rotation is by an angle 0. 
For example, the matrix associated with a rotation by an angle n/l is 
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We observe finally that the operations on matrices correspond to the 
operations on the associated linear map. For instance, if A, B are m x n 
matrices, then 


■'A + B 


La + Lz 


and if c is a number, then 


L c a ~ °La- 


This is obvious, because 

(A + B)X = AX + BX and (cA)X = c(AX). 

Similarly for compositions of mappings. Indeed, let 

F: K n -► K m and G: K m -> K s 

be linear maps, and let A , B be the matrices associated with F and G 
respectively. Then for any vector X in K n we have 

(G o F)(X) = G(F(X)) = B(AX) = (BA)X. 

Hence the product BA is the matrix associated with the composite linear 
map G°F. 

Theorem 2.2. Let A be an n x n matrix , and let A l 9 ... 9 A n be its col¬ 
umns . Then A is invertible if and only if A 1 ,...,A n are linearly indepen¬ 
dent. 

Proof Suppose A 1 ,...,A n are linearly independent. Then {A l 9 ...,A n } 
is a basis of K n 9 so the unit vectors E l 9 ...,E n can be expressed as linear 
combinations of A 1 ,...,A n . This means that there is a matrix B such 
that 

BA j = E j for j = 1,... ,n, 

say by Theorem 2.1 of Chapter III. But this is equivalent to saying that 
BA = I. Thus A is invertible. Conversely, suppose A is invertible. The 
linear map L A is such that 

L a (X) = AX = x.A 1 + • • • + x n A n . 

Since A is invertible, we must have Ker L A = 0, because if AX = 0 then 
A~ l AX = X — 0. Hence A 1 ,...,A n are linearly independent. This proves 
the theorem. 
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IV, §2. EXERCISES 

1. Find the matrix associated with the following linear maps. The vectors are 
written horizontally with a transpose sign for typographical reasons. 

(a) F : R 4 -> R 2 given by F( t (x 1 , x 2 , x 3 , x 4 )) = r (x lt x 2 ) (the projection) 

(b) The projection from R 4 to R 3 

(c) F: R 2 ->■ R 2 given by F('(x, y)) = '(3x, 3y) 

(d) F: R" -► R" given by F(X) = IX 

(e) F: R" -> R" given by F(X) = - X 

(f) F: R 4 -» R 4 given by F('(x,,x 2> X 3 ,x 4 )) = '(x„x 2 ,0,0) 

2. Find the matrix R(6) associated with the rotation for each of the following 
values of 9. 

(a) 7t/2 (b) 7 c/4 (c) n (d) — n (e) — 7c/3 

(0 n/6 (g) 5n/4 

3. In general, let 9 > 0. What is the matrix associated with the rotation by an 
angle — 9 (i.e. clockwise rotation by 0)? 

4. Let X = *(1,2) be a point of the plane. Let F be the rotation through an 
angle of n/4. What are the coordinates of F(X) relative to the usual basis 
{£\£ 2 }? 

5. Same question when X = '( — 1,3), and F is the rotation through n/2. 

6. Let F: R” -► R" be a linear map which is invertible. Show that if A is the 
matrix associated with F, then A~* is the matrix associated with the inverse 
of F. 

7. Let F be a rotation through an angle 0. Show that for any vector X in R 3 
we have ||X|| = ||F(X)|| (i.e. F preserves norms), where \\(a,b)\\ = -Ja 2 + b 2 . 

8. Let c be a number, and let L: R" ->• R" be the linear map such that L(X) = 
cX. What is the matrix associated with this linear map? 

9. Let F e be rotation by an angle 6. If 0, cp are numbers, compute the matrix 
of the linear map F $ oF (p and show that it is the matrix of F e+qt . 

10. Let F e be rotation by an angle 9. Show that F e is invertible, and determine 
the matrix associated with F e \ 

IV, §3. BASES, MATRICES, AND LINEAR MAPS 

In the first two sections we considered the relation between matrices and 
linear maps of K n into K m . Now let V, W be arbitrary finite dimensional 
vector spaces over K. Let 

® = {»!,•••,»«} and & = w m } 

be bases of V and W respectively. Then we know that elements of V and 
W have coordinate vectors with respect to these bases. In other words, if 
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veV then we can express v uniquely as a linear combination 

v = x 1 v 1 + ••• + x n v n , x t eK. 

Thus V is isomorphic to K n under the map K n -► V given by 

( x i> • • • >x n ) i—► x 1 v 1 + ••• + x n v n . 

Similarly for W. If F.V^W is a linear map, then using the above 
isomorphism, we can interpret F as a linear map of K n into K m , and 
thus we can associate a matrix with F, depending on our choice of bases, 
and denoted by 

This matrix is the unique matrix A having the following property: 

If X is the ( column ) coordinate vector of an element v of V , relative to 
the basis then AX is the ( column ) coordinate vector of F(v ), relative 
to the basis 

To use a notation which shows that the coordinate vector X depends 
on v and on the basis & we let 


X&(v) 

denote this coordinate vector. Then the above property can be stated in 
a formula. 

Theorem 3.1. Let V, W be vector spaces over K, and let 

F: V-> W 

be a linear map . Let & be a basis of V and a basis of W. If veV 
then 


XAm) = M%,{F)X m (v). 


Corollary 3.2. Let V be a vector space , and let be bases of V. 

Let veV. Then 


XAv) = M%'(id)X gfcv). 

The corollary expresses in a succinct way the manner in which the 
coordinates of a vector change when we change the basis of the vector 
space. 
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If A = M%.(F ), and X is the coordinate vector of v with respect to 
then by definition, 

F(v) = (A l .xyw l + -.- + (A m -xyw m . 

This matrix A is determined by the effect of F on the basis elements 
as follows. 

Let 

F(v i) = fluWi + •■■ + a ml w m 

(*) : : : 

F(v n ) = a ln w 1 + ••• + a mn w m . 

Then A turns out to be the transpose of the matrix 



Indeed, we have 

F(v ) = F(x 1 v 1 + • • • + x n v n ) = x^ivj + • • • + x h F(v h ). 

Using expression (*) for F(v l ),...,F(v n ) we find that 

F(v) = x 1 (a 11 w 1 H - + a ml w m ) + ••• + x n (a ln w 1 + ••• + 0 mfl vv m ), 

and after collecting the coefficients of w l9 ...,vv w , we can rewrite this ex¬ 
pression in the form 

(a ll x 1 + ••• + cii n x 1 )w 1 + ••• + (a m 1 x 1 + ••• + a mn x n )w m 

= (A 1 ‘X)w 1 + ••• + (A m -X)w m . 

This proves our assertion. 

Example 1. Assume that dim V = 2 and dim W = 3. Let F be the lin¬ 
ear map such that 

F(v x ) = 3 Wj - w 2 + 17w 3 , 

F(v 2 ) = w 1 + w 2 - w 3 . 
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Then the matrix associated with F is the matrix 



equal to the transpose of 



Example 2. Let id: V-> V be the identity map. Then for any basis & 
of V we have 


M%( id) = /, 

where / is the unit n x n matrix (if dim V = n). This is immediately veri¬ 
fied. 

Warning. Assume that V = W, but that we work with two bases 
and of V which are distinct. Then the matrix associated with the 
identity mapping of V into itself relative to these two distinct bases will 
not be the unit matrix! 

Example 3. Let = {v u ... 9 v n } and = {w l5 ... ,w n } be bases of the 
same vector space V. There exists a matrix A = (a y ) such that 

Wi = a 11 v 1 + ••• + a ln v n9 

W n = <*ni»i + •■■ + a nn v n . 

Then for each i = 1 we see that vv f = id(w f ). Hence by definition, 

id) = A. 

On the other hand, there exists a unique linear map F: V-> V such that 

F(v i ) = w l , F(v n ) = w„. 


Again by definition, we have 


M%(F) = 'A. 
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Theorem 3.3. Let V , W be vector spaces. Let & be a basis of V, and i 
a basis of W. Let /, g be two linear maps of V into W. Let M = Mi 
Then 


M(f + g) = M(f) + M(g). 


If c is a number , then 


M(cf) = cM(f). 


The association 


f^M%if) 

is an isomorphism between the space of linear maps F£(V, W) and the 
space of m x n matrices (if dim V = n and dim W = m). 

Proof The first formulas showing that / 1 —► M(f) is linear follow at 
once from the definition of the associated matrix. The association 
/ 1 —► M(f) is injective since M(f) = M(g) implies f = g, and it is surjec¬ 
tive since every linear map is represented by a matrix. Hence / 1 —► M(f) 
gives an isomorphism as stated. 

We now pass from the additive properties of the associated matrix to 
the multiplicative properties. 

Let U , V, W be sets. Let F: U -> V be a mapping, and let G: V-> W 
be a mapping. Then we can form a composite mapping from U into W 
as discussed previously, namely G°F. 

Theorem 3.4. Let V , W ., U be vector spaces. Let @1" be bases for 

V, W, U respectively. Let 

F: V-> W and G: U 

be linear maps. Then 


M%(G)M%(F) = M%„(G o F). 

(Note. Relative to our choice of bases, the theorem expresses the fact 
that composition of mappings corresponds to multiplication of matrices.) 

Proof. Let A be the matrix associated with F relative to the bases 
and let B be the matrix associated with G relative to the bases 
Let v be an element of V and let X be its (column) coordinate vec¬ 
tor relative to Then the coordinate vector of F(v) relative to is 
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AX. By definition, the coordinate vector of G(F(v)) relative to is 
B(AX\ which, by §2, is equal to (BA)X. But G(F(v)) = (G ° F)(v). 
Hence the coordinate vector of (G°F)(t;) relative to the basis gft" is 
(BA)X. By definition, this means that BA is the matrix associated with 
G°F, and proves our theorem. 

Remark. In many applications, one deals with linear maps of a vector 
space V into itself. If a basis & of V is selected, and F: V-> F is a linear 
map, then the matrix 

M%(F) 

is usually called the matrix associated with F relative to & (instead of 
saying relative to &). From the definition, we see that 

Ml(id) = /, 

where I is the unit matrix. As a direct consequence of Theorem 3.2 we 
obtain 

Corollary 3.5. Let V be a vector space and bases of V. Then 

M|,(id)M|'(id) = / = Mg'(id)Mg,(id). 

In particular , M%>( id) is invertible. 

Proof. Take V = W = U in Theorem 3.4, and F = G = id and 
= (%. Our assertion then drops out. 

The general formula of Theorem 3.2 will allow us to describe precisely 
how the matrix associated with a linear map changes when we change 
bases. 

Theorem 3.6. Let F: V -+ V be a linear map , and let be bases of 

V. Then there exists an invertible matrix N such that 

M%(F) = N~ 1 M%(F)N. 


In fact , we can take 


N = Mg'(id). 


Proof. Applying Theorem 3.2 step by step, we find that 

= M|,(id)M|(F)M|'(id). 


Corollary 3.5 implies the assertion to be proved. 
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Let V be a finite dimensional vector space over X, and let F: VV 
be a linear map. A basis of V is said to diagonalize F if the matrix 
associated with F relative to £8 is a diagonal matrix. If there exists such 
a basis which diagonalizes F, then we say that F is diagonalizable. It is 
not always true that a linear map can be diagonalized. Later in this 
book, we shall find sufficient conditions under which it can. If A is an 
n x n matrix in X, we say that A can be diagonalized (in X) if the linear 
map on K n represented by A can be diagonalized. From Theorem 3.6, 
we conclude at once: 

Theorem 3.7. Let V be a finite dimensional vector space over X, let 
F: V-+ V be a linear map , and let M be its associated matrix relative to 
a basis £8. Then F (or M) can be diagonalized (in X) if and only if 
there exists an invertible matrix N in X such that N~ 1 MN is a diag¬ 
onal matrix. 

In view of the importance of the map M N~ 1 MN , we give it a spe¬ 
cial name. Two matrices, M, M' are called similar (over a field X) if 
there exists an invertible matrix N in X such that M' = N~ X MN. 


IV, §3. EXERCISES 

1. In each one of the following cases, find M%.( id). The vector space in each 
case is R 3 . 

(a) 0 = {(1, 1,0), (-1, 1, 1), (0, 1,2)} 

£' = {(2, 1,1), (0, 0, IX (-1,1,1)} 

(b) £8 = {(3, 2, 1), (0, -2, 5), (1,1, 2)} 

0' = {(1,1,0), (-1,2, 4), (2, -1, 1)} 

2. Let L: V-+ V be a linear map. Let £8 = {v l9 ...,v H } be a basis of V. Suppose 
that there are numbers c 1 ,...,c n such that L(v t ) = for i = 1,...,«. What is 
M|(L)? 

3. For each real number 6 , let F d : R 2 -> R 2 be the linear map represented by the 
matrix 


R(0) = 


/cos 6 
\^sin 6 


— sin 9 \ 
cos 6J 


Show that if 6 , 6 ' are real numbers, then F e F e . = F e+e >. (You must use the 
addition formula for sine and cosine.) Also show that Fq 1 = F_ d . 

4. In general, let 6 > 0. What is the matrix associated with the identity map, 
and rotation of bases by an angle —6 (i.e. clockwise rotation by 0)? 

5. Let X = *(1, 2) be a point of the plane. Let F be the rotation through an 
angle of n/4. What are the coordinates of F(X) relative to the usual basis 
{E\E 2 }? 
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6. Same question when X = '(— 1, 3), and F is the rotation through tc/2. 

7. In general, let F be the rotation through an angle 6. Let (x, y) be a point of 
the plane in the standard coordinate system. Let (x', /) be the coordinates of 
this point in the rotated system. Express x', y' in terms of x, y, and 6. 

8. In each of the following cases, let D = d/dt be the derivative. We give a set 
of linearly independent functions These generate a vector space V, and D 
is a linear map from V into itself. Find the matrix associated with D relative 
to the bases 

(a) {e‘,e 2 ‘} 

(b) {U} 

(c) {e‘,te‘} 

(d) {1 ,t,t 2 } 

(e) {1, t, e‘, e 2 ‘, te 2 ’} 

(f) {sin t, cos t} 

9. (a) Let N be a square matrix. We say that N is nilpotent if there exists a 

positive integer r such that N r = 0. Prove that if N is nilpotent, then 
I — N is invertible. 

(b) State and prove the analogous statement for linear maps of a vector 
space into itself. 

10. Let P n be the vector space of polynomials of degree ^ n. Then the derivative 
D:P n ^P n is a linear map of P n into itself. Let / be the identity mapping. 
Prove that the following linear maps are invertible: 

(a) / - D 2 . 

(b) D m — I for any positive integer m. 

(c) D m — cl for any number c # 0. 

11. Let A be the n x n matrix 



which is upper triangular, with zeros on the diagonal, 1 just above the diag¬ 
onal, and zeros elsewhere as shown. 

(a) How would you describe the effect of L A on the standard basis vectors 
{F 1 ,of X"? 

(b) Show that A n = O and A n ~ l =£ O by using the effect of powers of A on 
the basis vectors. 



CHAPTER V 


Scalar Products 
and Orthogonality 


V, §1. SCALAR PRODUCTS 

Let V be a vector space over a field K. A scalar product on V is an 
association which to any pair of elements i?, w of F associates a scalar, 
denoted by <i?, w>, or also v-w , satisfying the following properties: 

SP 1 . JFe /ioi?e <i?, w> = <w, v ) /or all v , we K 

SP 2. // w, i?, w ore elements of V , then 

<w, v + w> = <w, i?> + <w, w>. 

SP 3. If x e K , £/zen 

<xw, i?> = x<w, v ) and <w, xi?) = x<w, i?>. 

The scalar product is said to be non-degenerate if in addition it also sat¬ 
isfies the condition: 

If v is an element of F, and <i?, w> = 0 for all we F, then v = 0. 
Example 1. Let V = K n . Then the map 

( X , Y)*-+X-Y, 

which to elements X , YeK n associates their dot product as we defined it 
previously, is a scalar product in the present sense. 
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Example 2. Let V be the space of continuous real-valued functions on 
the interval [0,1]. If /, ge V , we define 

</, 0> = f dt. 

Jo 

Simple properties of the integral show that this is a scalar product. 

In both examples the scalar product is non-degenerate. We had point¬ 
ed this out previously for the dot product of vectors in K n . In the sec¬ 
ond example, it is also easily shown from simple properties of the 
integral. 

In calculus, we study the second example, which gives rise to the theo¬ 
ry of Fourier series. Here we discuss only general properties of scalar 
products and applications to Euclidean spaces. The notation < , ) is 
used because in dealing with vector spaces of functions, a dot /• g may 
be confused with the ordinary product of functions. 

Let V be a vector space with a scalar product. As always, we define 
elements v, w of F to be orthogonal or perpendicular, and write vLw, if 
<u, w> =0. If S is a subset of V, we denote by S 1 the set of all elements 
weV which are perpendicular to all elements of S , i.e. <w, v} = 0 for all 
veS. Then using SP 2 and SP 3, one verifies at once that S 1 is a sub¬ 
space of V, called the orthogonal space of S. If w is perpendicular to S , 
we also write w IS. Let U be the subspace of V generated by the ele¬ 
ments of S. If w is perpendicular to S, and if v l9 v 2 eS, then 


<w, v t + v 2 y = <w, v x y + <w, v 2 y = o. 


If c is a scalar, then 


<w, cv x y = c<w, v^y. 

Hence w is perpendicular to linear combinations of elements of S, and 
hence w is perpendicular to U. 


Example 3. Let ( a be an m x n matrix in K , and let A u be its 

row vectors. Let X = t (x 1 , .. .,x„) as usual. The system of homogeneous 
linear equations 


(**) 


a ll X l + ‘ ‘ + <*1 n X n ~ 0 

«ml*l +••• + « = 0 


can also be written in an abbreviated form, namely 


A r X = 0, ..., A m -X = 0. 
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The set of solutions X of this homogeneous system is a vector space 
over K. In fact, let W be the space generated by A l9 ... 9 A m . Let U be 
the space consisting of all vectors in K n perpendicular to A l9 ... 9 A m . 
Then U is precisely the vector space of solutions of (**). The vectors 
A l9 ...,A m may not be linearly independent. We note that dim W ^ m, 
and we call 

dim U = dim W 1 

the dimension of the space of solutions of the system of linear equations. 

We shall discuss this dimension at greater length later. 

Let V again be a vector space over the field K , with a scalar product. 
Let {v l9 ... 9 v n } be a basis of V. We shall say that it is an orthogonal 
basis if <*;*, Vj ) = 0 for all i # j. We shall show later that if V is a finite 
dimensional vector space, with a scalar product, then there always exists 
an orthogonal basis. However, we shall first discuss important special 
cases over the real and complex numbers. 


The real positive definite case 

Let V be a vector space over R, with a scalar product. We shall call this 
scalar product positive definite if <p, v} ^ 0 for all veV 9 and < v , v} > 0 if 
v O. The ordinary dot product of vectors in R" is positive definite, and 
so is the scalar product of Example 2 above. 

Let V be a vector space over R, with a positive definite scalar product 
denoted by < , ). Let W be a subspace. Then W has a scalar product 
defined by the same rule defining the scalar product in V. In other 
words, if w, W are elements of W 9 we may form their product <w, w'>. 
This scalar product on W is obviously positive definite. 

For instance, if W is the subspace of R 3 generated by the two vectors 
(1, 2, 2) and (n 9 — 1, 0), then IT is a vector space in its own right, and we 
can take the dot product of vectors lying in W to define a positive defi¬ 
nite scalar product on W. We often have to deal with such subspaces, 
and this is one reason why we develop our theory on arbitrary (finite di¬ 
mensional) spaces over R with a given positive definite scalar product, 
instead of working only on R" with the dot product. Another reason is 
that we wish our theory to apply to situations as described in Example 2 
of §L 

We define the norm of an element v e V by 

M = V O, v). 

If c is any number, then we immediately get 
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because 

HerII = J(cv, cv) = y C 2 <r, v) = |c| ||t>||. 

The distance between two elements v, w of F is defined to be 

dist(y, w) = \\v — w\\. 

This definition stems from the Pythagoras theorem. For example, 
suppose V = R 3 with the usual dot product as the scalar product. If 
X = (x, y, z)e V then 


ii^ii = y * 2 + 7 + z 2 - 

This coincides precisely with our notion of distance from the origin 0 to 
the point A by making use of Pythagoras’ theorem. 

We can also justify our definition of perpendicularity. Again the intu¬ 
ition of plane geometry and the following figure tell us that v is perpen¬ 
dicular to w if and only if 


\v - wll = lit; + wll. 



w + 



(a) 


But then by algebra: 


(b) 


Figure 1 


\v — w|| = ||t; + w|| <=> ||t; — w|| 2 = ||t; + w|| 2 

<=> (v — w) 2 = (v + w) 2 

<=> v 2 — 2v • w + w 2 = v 2 + 2v • w + w 2 

<^> 4v • w = 0 

<^> v • w = 0. 


This is the desired justification. 
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You probably have studied the dot product of n-tuples in a previous 
course. Basic properties which were proved without coordinates can be 
proved for our more general scalar product. We shall carry such proofs 
out, and meet other examples as we go along. 

We say that an element veV is a unit vector if ||t?|| = 1. If veV and 
v ^ 0, then t?/||t;|| is a unit vector. 

The following two identities follow directly from the definition of the 
length. 


The Pythagoras theorem. If v , w are perpendicular , then 

\\v + w|| 2 = ||y|| 2 + ||w|| 2 . 

The parallelogram law. For any v , w we have 

\\v + w|| 2 + \\v — w|| 2 = 21| i; || 2 + 2||w|| 2 . 

The proofs are trivial. We give the first, and leave the second as an 
exercise. For the first, we have 

||t? + w|| 2 = (v 4- w, v + w> = <p, V s ) + 2<p, w) + <w, w> 

= ||p|| 2 + ||w|| 2 because v ± w. 


This proves Pythagoras. 

Let w be an element of V such that ||w|| ^ 0. For any v there exists a 
unique number c such that v — cw is perpendicular to w. Indeed, for 
v — cw to be perpendicular to w we must have 


<v — cw, w) = 0, 


whence <i?, w) — <cw, w) = 0 and <p, w) = c<w, w). Thus 


_ w> 

<w, w) 


Conversely, letting c have this value shows that v — cw is perpendicular 
to w. We call c the component of v along w. We call cw the projection of 
v along w. 
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As with the case of n-space, we define the projection of v along w to 
be the vector cw, because of our usual picture: 



Figure 2 


In particular, if w is a unit vector, then the component of v along w is 
simply 


c = < v , w>. 


Example 4. Let V = R n with the usual scalar product, i.e. the dot 
product. If E ( is the i-th unit vector, and X = (x u ...,x n ) then the com¬ 
ponent of X along E t is simply 


XE i = x i , 

that is, the i-th component of X. 

Example 5. Let V be the space of continuous functions on [ — n, 7t]. 
Let / be the function given by /(x) = sin kx, where k is some integer > 0. 
Then 


ii/ii= y</,/> 


1/2 


sin 2 kx dx 



In the present example of a vector space of functions, the component 
of g along / is called the Fourier coefficient of g with respect to /. If g is 
any continuous function on [ — 7i, 7r], then the Fourier coefficient of g 
with respect to / is 


<0,f> 

</,/> 


n 


g(x) sin kx dx. 


Theorem 1.1. Schwarz inequality. For all v , weV we have 
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Proof. If w = 0, then both sides are equal to 0 and our inequality is 
obvious. Next, assume that w = e is a unit vector, that is eeV and 
||e|| = 1. If c is the component of v along e , then v — ce is perpendicular 
to e , and also perpendicular to ce. Hence by the Pythagoras theorem, 
we find 

IMI 2 = \\ v — Ml 2 + |IM | 2 

= ||t; — cell 2 + c 2 . 

Hence c 2 ^ ||r|| 2 , so that |c| ^ ||t;||. Finally, if w is arbitrary # 0, then 
e = w/||w|| is a unit vector, so that by what we just saw, 



This yields 

|<t>, w>| ^ ||t;|| ||vv||, 

as desired. 

Theorem 1.2. Triangle inequality. If v, weV, then 

l\v + w|| ^ ||u|| + ||vv||. 

Proof. Each side of this inequality is positive or 0. Hence it will suf¬ 
fice to prove that their squares satisfy the desired inequality, in other 
words 

(p + w) 2 ^(|M| + ||w||) 2 . 

To do this we have: 

(v + w) 2 = (v + w) • (v + w) = v 2 + 2v • w + w 2 

^ ||t;|| 2 + 21|t?|| ||w|| + ||w 2 1| (by Theorem 1.1) 
= m + iiwii) 2 , 

thus proving the triangle inequality. 

Let v l9 ...,v„ be non-zero elements of V which are mutually perpendic¬ 
ular, that is Vj ) = 0 if i ^ j- Let c i be the component of v along v t . 
Then 

V - C lVl - c n v n 

is perpendicular to v u ...,v n . To see this, all we have to do is to take the 
product with Vj for any j. All the terms involving (v i9 vf) will give 0 if 
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i # j, and we shall have two remaining terms 

<». v j> ~ C J< V P v i> 

which cancel. Thus subtracting linear combinations as above orthog- 
onalizes v with respect to v l9 ...,v n . The next theorem shows that 
c 1 v 1 + • • • + c n v n gives the closest approximation to v as a linear com¬ 
bination of v l9 ... 9 v n . 

Theorem 1.3. Let v x ,...,v n be vectors which are mutually perpendicular , 
and such that H^H ^ 0 for all i. Let v be an element of V , and let c { be 
the component of v along v t . Let a l9 ... 9 a n be numbers. Then 


n 

V - z c k v k 

< 

v - 

n 

- Z a k v k 

k= 1 



k= 1 


Proof We know that 

n 

v - Z c k v k 

k= 1 

is perpendicular to each v i9 i = 1Hence it is perpendicular to any 
linear combination of v l9 ... 9 v n . Now we have: 

Ilf - Z a k v k \\ 2 = ||t) - Z c k v k + z ( c k - a k)v k \\ 2 

= ll» - Z c k v k\\ 2 + HZ ( c k ~ a k )v k \\ 2 

by the Pythagoras theorem. This proves that 

Ik - Z c k v k\\ 2 ^ Ik - Z a k v k\\ 2 , 

and thus our theorem is proved. 

The next theorem is known as the Bessel inequality. 

Theorem 1.4. If v l9 ...,v n are mutually perpendicular unit vectors , and if 
Ci is the component of v along v h then 

Z cfglkll 2 . 

1=1 
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Proof. The elements v — c^, v u ... ,v„ are mutually perpendicular. 

Therefore: 

INI 2 = \\v ~ x CfPill 2 + III CiViW 2 by Pythagoras 

= II Yu c i v i\\ 2 because a norm is ^ 0 

= Yj c f by Pythagoras 

because v u ...,v n are mutually perpendicular and \\Vi\\ 2 = 1. This proves 
the theorem. 


V, §1. EXERCISES 


1. Let V be a vector space with a scalar product. Show that <0, v} = 0 for all v 
in V. 


2. Assume that the scalar product is positive definite. Let v l9 ...,v„ be non-zero 
elements which are mutually perpendicular, that is <i? f , Vj) = 0 if i ^ j- Show 
that they are linearly independent. 


3. Let M be a square n x n matrix which is equal to its transpose, 
column n-vectors, then 


l XMY 


If AT, Y are 


is a 1 x 1 matrix, which we identify with a number. Show that the map 

(X, Y) i-> l XMY 

satisfies the three properties SP 1, SP 2, SP 3. Give an example of a 2 x 2 ma¬ 
trix M such that the product is not positive definite. 


V, §2. ORTHOGONAL BASES, POSITIVE DEFINITE CASE 

Let V be a vector space with a positive definite scalar product through¬ 
out this section. A basis {v u ...,v n } of V is said to be orthogonal if its 
elements are mutually perpendicular, i.e. <i? f , vf) — 0 whenever i ^ j. If in 
addition each element of the basis has norm 1, then the basis is called 

orthonormal. 

The standard unit vectors of R” form an orthonormal basis of R", 
with respect to the ordinary dot product. 

Theorem 2.1. Let V be a finite dimensional vector space , with a positive 
definite scalar product. Let W be a subspace of V, and let {w 1 ,...,w m } 
be an orthogonal basis of W. If W ^ V , then there exist elements 
w m+1 ,... ,w„ of V such that {w 1? ...,w„} is an orthogonal basis of V. 
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Proof. The method of proof is as important as the theorem, and is 
called the Gram-Schmidt orthogonalization process. We know from 
Chapter II, §3 that we can find elements v m + 1 , ...,v n of V such that 

{w 1 ,...,W M , V m+ l>„} 

is a basis of V. Of course, it is not an orthogonal basis. Let W m+l be 
the space generated by w l5 .. .,w m , v m + l . We shall first obtain an orthog¬ 
onal basis of W m + l . The idea is to take v m+1 and substract from it its 
projection along w 1 ,...,w m . Thus we let 


Ci = 


(W^W!) 


<W m , W m > 


Let 


w m+ i = V m+1 - c l w 1 - 


c m vv m . 


Then w m + 1 is perpendicular to w 1 ,...,w m . Furthermore, w m+1 ^0 
(otherwise v m+1 would be linearly dependent on w l5 ...,w m ), and v m+l lies 
in the space generated by w 1 ,...,w m+1 because 


Vm+ 1 = W m+1 + CiWi + ••• + C m W m . 


Hence {w 1 ,...,w m+1 } is an orthogonal basis of W m + 1 . We can now pro¬ 
ceed by induction, showing that the space W m+S generated by 




has an orthogonal basis 


{w 1 ,...,w m+1 ,...,w m+s } 

with s = 1,... ,n — m. This concludes the proof. 

Corollary 2.2. Let V be a finite dimensional vector space with a positive 
definite scalar product. Assume that V # {O}. Then V has an orthogo¬ 
nal basis. 

Proof. By hypothesis, there exists an element v x of V such that v x # O. 
We let W be the subspace generated by v u and apply the theorem to get 
the desired basis. 
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We summarize the procedure of Theorem 2.1 once more. Suppose we 
are given an arbitrary basis {v l ,...,v n } of V. We wish to orthogonalize it. 
We proceed as follows. We let 

v 'i = 

v' 2 = V 2 

= V 3 


<v'u v \> 


'u 


W 2 ,v f 2 y Vl (v'^v^y v 


i» 


= Vn 


/ / r \ — 1 

< V n~l> Vn- t > 

Then {v \,... ,v' n } is an orthogonal basis. 


<XX-1> , <V n ,v '!> , 

-—--— V 1 


<v'uv\y v 


Given an orthogonal basis, we can always obtain an orthonormal ba¬ 
sis by dividing each vector by its norm. 


Example 1. Find an orthonormal basis for the vector space generated 
by the vectors (1,1, 0, 1), (1, —2, 0, 0), and (1, 0, — 1, 2). 

Let us denote these vectors by A , B, C. Let 


In other words, we subtract from B its projection along A. Then B' is 
perpendicular to A. We find 


B'=H 4, -5,0, 1). 

Now we subtract from C its projection along A and B\ and thus we let 

C-A CB' 

C = C- — A - ——zr B\ 


A A 


B'B' 


Since A and B' are perpendicular, taking the scalar product of C with A 
and B' shows that C is perpendicular to both A and B'. We find 

C = H- 4, -2, -7, 6). 


The vectors A , B\ C are non-zero and mutually perpendicular. They lie 
in the space generated by A, B , C. Hence they constitute an orthogonal 
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basis for that space. If we wish an orthonormal basis, then we divide 
these vectors by their norm, and thus obtain 


A 

m 

B' 


B' 


C 


C 


1 

7^ 

i 

^42 

1 


yio5 


(1.1.0,1), 

(4, -5,0,1), 

(-4, -2, -7, 6), 


as an orthonormal basis. 

Theorem 2.3. Let V be a vector space over R with a positive definite 
scalar product , of dimension n. Let W be a subspace of V of dimension 
r. Let W 1 be the subspace of V consisting of all elements which are 
perpendicular to W. Then V is the direct sum of W and W 1 , and W 1 
has dimension n — r. In other words , 

dim W + dim W 1 = dim V. 

Proof If W consists of O alone, or if W = V 9 then our assertion is ob¬ 
vious. We therefore assume that W # V and that W ^ {0}. Let 
{w 1 ,...,w r } be an orthonormal basis of W. By Theorem 2.1, there exist 
elements u r+ 15 ..., u n of V such that 


{Wi,...,W,, U r+ 

is an orthonormal basis of V. We shall prove that {u r + l9 .. .,m„} is an 
orthonormal basis of W L . 

Let u be an element of W L . Then there exist numbers x u ...,x n such 
that 

u = x l w 1 + ••• + x r w r + x r+1 w r + 1 + ••• + x n u n . 

Since u is perpendicular to W, taking the product with any w t 
(i = 1 ,... ,r), we find 


0 = <tt, Wf) = x f <w f , w*> = x*. 

Hence all x ( = 0 (i = l,...,r). Therefore u is a linear combination of 
+ 1 > * • • Mn- 

Conversely, let u = x r+1 u r + 1 -1- • •• -1- x n u n be a linear combination of 
u r+ u n . Taking the product with any w ( yields 0. Hence u is perpen¬ 
dicular to all w ( (i = l,...,r), and hence is perpendicular to W. This 
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proves that u r+ u ...,u n generate W 1 . Since they are mutually perpendicu¬ 
lar, and of norm 1, they form an orthonormal basis of W 1 , whose di¬ 
mension is therefore n — r. Furthermore, an element of V has a unique 
expression as a linear combination 


XiWi + ••• + x,vv r + x r+1 u r+l + ••• + x n u n , 

and hence a unique expression as a sum w + u with weW and ueW 1 . 
Hence V is the direct sum of W and W 1 . 

The space W 1 is called the orthogonal complement of W. 

Example 2. Consider R 3 . Let A, B be two linearly independent vec¬ 
tors in R 3 . Then the space of vectors which are perpendicular to both A 
and B is a 1-dimensional space. If { N } is a basis for this space, any 
other basis for this space is of type {tN}, where t is a number # 0. 

Again in R 3 , let A be a non-zero vector. The space of vectors perpen¬ 
dicular to N is a 2-dimensional space, i.e. a plane, passing through the 
origin O. 

Let V be a finite dimensional vector space over R, with a positive 
definite scalar product. Let {e u ... 9 e n } be an orthonormal basis. Let 
v , weV. There exist numbers x 1? ...,x„eR and y l5 ...,y„eR such that 

V = x i e l + • • • + x n e n and vv = y 1 e 1 + • • • + y„e„. 


Then 


(v, W> = (x t e t + ■ ■ ■ + x„e„, y t e t +■■■ + y n e „> 

n 

= Z x t yj< e i, ej} = x l y 1 + • • • + x n y n . 
i.j= i 

Thus in terms of this orthonormal basis, if X, Y are the coordinate vec¬ 
tors of v and w respectively, the scalar product is given by the ordinary 
dot product X • Y of the coordinate vectors. This is definitely not the 
case if we deal with a basis which is not orthonormal. If {v l9 ...,v n } is 
any basis of V, and we write 


v = x 1 v 1 + ••• + x n v n 
w = y lVl + ••• + y n v n 


in terms of the basis, then 

n 

<v, w> = X vj>- 

i.j= 1 
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Each Vj) is a number. If we let a tJ = (v h Vj), then 

n 

<V, W> = X a ij x i x )■ 
ij= 1 


Hermitian products 

We shall now describe the modification necessary to adapt the preceding 
results to vector spaces over the complex numbers. We wish to preserve 
the notion of a positive definite scalar product as far as possible. Since 
the dot product of vectors with complex coordinates may be equal to 0 
without the vectors being equal to 0 , we must change something in the 
definition. It turns out that the needed change is very slight. 

Let V be a vector space over the complex numbers. A hermitian prod¬ 
uct on V is a rule which to any pair of elements v, w of V associates a 
complex number, denoted again by <t?, w>, satisfying the following prop¬ 
erties: 


HP 1. We have <v , w> = <w, v ) for all v , we K (Here the bar denotes 
complex conjugate.) 

HP 2. If u , t;, w are elements of V 9 then 

<M, V + W> = <M, V ) + <M, W>. 


HP 3. If a e C, then 

<a m, v ) = a<w, v ) and <u, ocv ) = a<u, r>>. 

The hermitian product is said to be positive definite if (v, v) ^ 0 for all 
ve V, and < v, v) > 0 if v ^ O. 

We define the words orthogonal, perpendicular, orthogonal basis, or¬ 
thogonal complement as before. There is nothing to change either in our 
definition of component and projection of v along w, or in the remarks 
which we made concerning these. 

Example 3. Let V = C". If X = (x l9 ... ,x n ) and Y = (y 1? ... ,y„) are vec¬ 
tors in C", we define their hermitian product to be 


Y > = + ••• + x n y n . 


Conditions HP 1, HP 2 and HP 3 are immediately verified. This product 
is positive definite because if X ^ O, then some x ( ^ 0, and > 0. 
Hence <X, X > > 0. 
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Note however that if X = (1, i) then 

X ■ X = 1 — 1 = 0. 


Example 4. Let V be the space of continuous complex-valued func¬ 
tions on the interval [ — n, n\. If /, g e V, we define 


if, g> = 


m n 

J — n 


f(t)g(t) dt. 


Standard properties of the integral again show that this is a hermitian 
product which is positive definite. Let f n be the function such that 

Lit) = e int . 

A simple computation shows that f n is orthogonal to f m if n, m are dis¬ 
tinct integers. Furthermore, we have 


</„,/„> 


i'll 


e in, e~ in, dt = 2 n. 


If /e V, then its Fourier coefficient with respect to /„ is therefore equal to 


</./,) 

ifn, In') 


2n 



dt. 


which a reader acquainted with analysis will immediately recognize. 

We return to our general discussion of hermitian products. We have 
the analogue of Theorem 2.1 and its corollary for positive definite hermi¬ 
tian products, namely: 


Theorem 2.4. Let V be a finite dimensional vector space over the com¬ 
plex numbers, with a positive definite hermitian product. Let W be a 
subspace of V, and let {w x , be an orthogonal basis of W. If 

W 7 ^ K then there exist elements w m+1 ,...,w n of V such that 
{w 1? ... ,w„} is an orthogonal basis of V. 

Corollary 2.5. Let V be a finite dimensional vector space over the com¬ 
plex numbers, with a positive definite hermitian product. Assume that 
V {O}. Then V has an orthogonal basis. 

The proofs are exactly the same as those given previously for the real 
case, and there is no need to repeat them. 
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We now come to the theory of the norm. Let V be a vector space 
over C, with a positive definite hermitian product. If veV, we define its 
norm by letting 

Ml = s/ < v ’ r >- 

Since < v , v) is real, ^ 0, its square root is taken as usual to be the 
unique real number ^ 0 whose square is (v, v). 

We have the Schwarz inequality, namely 

\<V, W>| ^ Hull ||w||. 

The three properties of the norm hold as in the real case: 

For all i;eV 9 we have ||u|| ^ 0, and = 0 if and only if v = O. 

For any complex number a, we have ||au|| = |a| ||u||. 

For any elements v 9 we V we have \\v + w|| ^ ||u|| + ||w||. 

All these are again easily verified. We leave the first two as exercises, 
and carry out the third completely, using the Schwarz inequality. 

It will suffice to prove that 

||u + w|| 2 ^(M + Ml) 2 . 

To do this, we observe that 

|| V + w|| 2 = + W, V + W> = <P, v} + <w, v} + <P, w) + <w, w). 


But <w, v ) + <p, w> = <p, w> + <u, w) ^ 2|<p, w>|. Hence by Schwarz, 

||u + w|| 2 ^ ||u|| 2 + 2|<u, w>| + ||w|| 2 

^ ||p|| 2 + 2||u|| || w || + ||w|| 2 = (||u|| + ||w||) 2 . 

Taking the square root of each side yields what we want. 

An element v of V is said to be a unit vector as in the real case, if 
||u|| = 1. An orthogonal basis {v l9 ... 9 v n } is said to be orthonormal if it 
consists of unit vectors. As before, we obtain an orthonormal basis from 
an orthogonal one by dividing each vector by its norm. 

Let {e l 9 ... 9 e n } be an orthonormal basis of V. Let u, weV. There exist 
complex numbers a 1 ,...,a n eC and p i9 ... 9 P n EC such that 


v = ot 1 e i + ... + ot n e n 
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and 

Then 


w — P l e 1 + ••• + P n e n . 

O, w> = {oc 1 e 1 + • • • + ot n e n9 P 1 e 1 + • • • + P n e n > 

= Z a iPj( e i, ej) 
i,j= 1 

= U-lPl + ••• + a nPn- 


Thus in terms of this orthonormal basis, if A , B are the coordinate vec¬ 
tors of v and w respectively, the hermitian product is given by the prod¬ 
uct described in Example 3, namely A • B. 

We now have theorems which we state simultaneously for the real and 
complex cases. The proofs are word for word the same as the proof of 
Theorem 2.3, and so will not be reproduced. 


Theorem 2.6. Let V be either a vector space over R with a positive de¬ 
finite scalar product , or a vector space over C with a positive definite 
hermitian product. Assume that V has finite dimension n. Let W be a 
subspace of V of dimension r. Let W 1 be the subspace of V consisting 
of all elements of V which are perpendicular to W. Then W 1 has di¬ 
mension n — r. In other words , 

dim W + dim W 1 = dim V. 

Theorem 2.7. Let V be either a vector space over R with a positive de¬ 
finite scalar product , or a vector space over C with a positive definite 
hermitian product. Assume that V is finite dimensional. Let W be a 
subspace of V. Then V is the direct sum of W and W 1 . 


V, §2. EXERCISES 

0. What is the dimension of the subspace of R 6 perpendicular to the two vec¬ 
tors (1,1, -2, 3,4, 5) and (0, 0, 1, 1, 0, 1)1 

1. Find an orthonormal basis for the subspace of R 3 generated by the following 
vectors: 

(a) (1, 1, -1) and (1, 0, 1) (b) (2, 1, 1) and (1, 3, -1) 

2. Find an orthonormal basis for the subspace of R 4 generated by the following 
vectors: 

(a) (1,2, 1,0) and (1, 2, 3, 1) 

(b) (1, 1,0,0), (1, -1,1,1) and (-1,0,2, 1) 

3. In Exercises 3 through 5 we consider the vector space of continuous real¬ 
valued functions on the interval [0, 1]. We define the scalar product of 
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two such functions /, g by the rule 


</» 0 > = 


f(t)g(t) dt. 
Jo 


Using standard properties of the integral, verify that this is a scalar product. 

4. Let V be the subspace of functions generated by the two functions /, g such 
that f(t) = t and g(t) = t 2 . Find an orthonormal basis for V. 

5. Let V be the subspace generated by the three functions 1, t, t 2 (where 1 is 
the constant function). Find an orthonormal basis for V. 

6. Find an orthonormal basis for the subspace of C 3 generated by the following 
vectors: 

(a) (1, i, 0) and (1, 1, 1) (b) (1, -1, -i) and (i, 1, 2) 

7. (a) Let V be the vector space of all n x n matrices over R, and define the 

scalar product of two matrices A, B by 

<A, By = Xr{AB\ 

where tr is the trace (sum of the diagonal elements). Show that this is a 
scalar product and that it is non-degenerate. 

(b) If A is a real symmetric matrix, show that iv(AA) ^ 0, and iv(AA) > 0 if 
A ^ O. Thus the trace defines a positive definite scalar product on the 
space of real symmetric matrices. 

(c) Let V be the vector space of real n x n symmetric matrices. What is 
dim VI What is the dimension of the subspace W consisting of those 
matrices A such that tr (A) = 0? What is the dimension of the orthogonal 
complement W L relative to the positive definite scalar product of part 
(b)? 

8. Notation as in Exercise 7, describe the orthogonal complement of the sub¬ 
space of diagonal matrices. What is the dimension of this orthogonal com¬ 
plement? 

9. Let V be a finite dimensional space over R, with a positive definite scalar 
product. Let {v l9 ...,v m } be a set of elements of V, of norm 1, and mutually 
perpendicular (i.e. <y f , Vj ) = 0 if i ^ j ). Assume that for every De V we have 


m 


I <C, ^> 2 - 


Show that {v u ...,v m } is a basis of V. 

10. Let V be a finite dimensional space over R, with a positive definite scalar 
product. Prove the parallelogram law, for any elements v , w e V, 



[V, §3] APPLICATION TO LINEAR EQUATIONS; THE RANK 


113 


V, §3. APPLICATION TO LINEAR EQUATIONS; THE RANK 


Theorem 2.3 of the preceding section has an interesting application to 
the theory of linear equations. We consider such a system: 

flnXi + ••• + a ln x n = 0 

(**) ; ; 


«m 1*1 + ••• + a mn x n = 0. 

We can interpret its space of solutions in three ways: 


(a) It consists of those vectors X giving linear relations 

x x A l + • • • + x n A n = O 
between the columns of A. 

(b) The solutions form the space orthogonal to the row vectors of the 
matrix A. 

(c) The solutions form the kernel of the linear map represented by A , 
i.e. are the solutions of the equation AX = O. 


The linear equations are assumed to have coefficients a tj in a field K. 
The analogue of Theorem 2.3 is true for the scalar product on K n . In¬ 
deed, let W be a subspace of K n and let W 1 be the subset of all elements 
XeK n such that 


X Y = 0 for all YeW. 

Then W 1 is a subspace of K n . Observe that we can have X • X = 0 even 
if 1^0. For instance, let K = C be the complex numbers and let 
X = (1, i). Then X X = 1 — 1=0. However, the analogue of Theorem 
2.3 is still true, namely: 

Theorem 3.1. Let W be a subspace of K n . Then 

dim W + dim W 1 = n. 

We shall prove this theorem in §6, Theorem 6.4. Here we shall apply it 
to the study of linear equations. 

If A = ( aij) is an m x n matrix, then the columns A l ,...,A n generate a 
subspace, whose dimension is called the column rank of A. The rows 
A u ...,A m of A generate a subspace whose dimension is called the row 
rank of A. We may also say that the column rank of A is the maximum 
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number of linearly independent columns, and the row rank is the maxi¬ 
mum number of linearly independent rows of A. 

Theorem 3.2. Let A — (a^) be an m x n matrix . Then the row rank and 
the column rank of A are equal to the same number r. Furthermore , 
n — r is the dimension of the space of solutions of the system of linear 
equations (**). 

Proof We shall prove all our statements simultaneously. We consider 
the map 

L: K n ^K m 


given by 


L(X) = x 1 A 1 + • • • + x n A n . 


This map is obviously linear. Its image consists of the space generated 
by the column vectors of A. Its kernel is by definition the space of solu¬ 
tions of the system of linear equations. By Theorem 3.2 of Chapter III, 
§3, we obtain 


column rank + dim space of solutions = n. 

On the other hand, interpreting the space of solutions as the orthogonal 
space to the row vectors, and using the theorem on the dimension of an 
orthogonal subspace, we obtain 

row rank + dim space of solutions = n. 

From this all our assertions follow at once, and Theorem 3.2 is proved. 

In view of Theorem 3.2, the row rank, or the column rank, is also 
called the rank. 

Remark. Let L = L A : K n -> K m be the linear map given by 

Xh^AX. 

Then L is also described by the formula 

L(X) = x 1 A 1 + • • • + x n A n . 


rank A — dim Im L A . 


Therefore 
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Let b l9 . 
equations 

w 


,b m be numbers, and consider the system of inhomogeneous 
A^X = b 1 

A m -X = b m . 


It may happen that this system has no solution at all, i.e. that the equa¬ 
tions are inconsistent. For instance, the system 


2x + 3y — z = 1, 
2x + 3y — z = 2 


has no solution. However, if there is at least one solution, then all solu¬ 
tions are obtainable from this one by adding an arbitrary solution of the 
associated homogeneous system (**) (cf. Exercise 7). Hence in this case 
again, we can speak of the dimension of the set of solutions. It is the 
dimension of the associated homogeneous system. 

Example 1 . Find the rank of the matrix 



There are only two rows, so the rank is at most 2. On the other hand, 
the two columns 



are linearly independent, for if a , b are numbers such that 



then 


2 ci + b — 0, 
b = 0, 


so that a = 0. Therefore the two columns are linearly independent, and 
the rank is equal to 2. 
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Example 2. Find the dimension of the set of solutions of the following 
system of equations, and determine this set in R 3 : 

2x + y + z = 1, 
y — z = 0. 

We see by inspection that there is at least one solution, namely x = j, 
y = z = 0. The rank of the matrix 

2 1 1 

0 1-1 

is 2. Hence the dimension of the set of solutions is 1. The vector space 
of solutions of the homogeneous system has dimension 1, and one solu¬ 
tion is easily found to be 


y = z = 1, x = 

Hence the set of solutions of the inhomogneous system is the set of all 
vectors 


(i, 0,0 ) + r(-±, 1,1), 


where t ranges over all real numbers. We see that our set of solutions is 
a straight line. 

Example 3. Find a basis for the space of solutions of the equation 


3x — 2y + z = 0. 

Let A = (3, —2, 1). The space of solutions is the space orthogonal to 
A , and hence has dimension 2. There are of course many bases for this 
space. To find one, we first extend (3, — 2,1) = A to a basis of R 3 . We 
do this by selecting vectors B , C such that A , B , C are linearly indepen¬ 
dent. For instance, take 


and 


B = (0,1,0) 


C = (0, 0,1). 


Then A , B , C are linearly independent. To see this, we proceed as usual. 
If a , b , c are numbers such that 


aA + bB + cC = O, 
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then 

3 a = 0, 

— 2^ + b = 0, 
a + c = 0. 

This is easily solved to see that 

a = b = c = 0, 

so A, B , C are linearly independent. Now we must orthogonalize these 
vectors. 

Let 

<A,A > <F,F> 

= (0,0,1)-^(3, -2,1)- ^(3, 5,1). 

Then {B\ C) is a basis for the space of solutions of the given equation. 

V, §3. EXERCISES 

1. Find the rank of the following matrices. 


(a) (2 1 3\ (b)/-l 2 -2\ 

V? 2 0y/ \ 3 4 ~ 5 ) 



2. Let A, B be two matrices which can be multiplied. Show that 


rank of AB ^ rank of A, and also rank of AB ^ rank of B. 
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3. Let A be a triangular matrix 



Assume that none of the diagonal elements is equal to 0. What is the rank of 
A? 

4. Find the dimension of the space of solutions of the following systems of equa¬ 
tions. Also find a basis for this space of solutions. 

(a) 2x + y — z = 0 (b) x — y + z = 0 

y + z = 0 

(c) 4x -I- ly — nz = 0 (d) x -1- y + z = 0 

2x — y + z = 0 x — y =0 

y + z = 0 

5. What is the dimension of the space of solutions of the following systems of 


linear equations? 



(a) 2x — 3y + z = 0 

(b) 

2x + ly = 0 

x + y — z = 0 


x — 2y + z = 0 

(c) 2x — 3y + z = 0 

(d) 

x + y + z = 0 


x + y — z = 0 2x + 2y + 2z = 0 

3x + 4y = 0 
5x + y + z = 0 


6. Let A be a non-zero vector in n-space. Let P be a point in n-space. What is 
the dimension of the set of solutions of the equation 

X • A = P Al 

7. Let AX = B be a system of linear equations, where A is an m x n matrix, X is 
an n-vector, and B is an m- vector. Assume that there is one solution X — X 0 . 
Show that every solution is of the form X 0 + Y, where Y is a solution of the 
homogeneous system A Y = O, and conversely any vector of the form X 0 + Y 
is a solution. 


V, §4. BILINEAR MAPS AND MATRICES 

Let U, V, W be vector spaces over K, and let 

g: U x V-> W 

be a map. We say that g is bilinear if for each fixed ueU the map 

v g(u, v ) 
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is linear, and for each fixed veV, the map 

u i —> g(u , v ) 

is linear. The first condition written out reads 


g(u, v 1 + v 2 ) = g(u , v x ) + g(u , p 2 ), 
g(u , cp) = cg(u , p), 

and similarly for the second condition on the other side. 


Example. Let >1 be an m x n matrix, ,4 = (a y ). We can define a map 


by letting 


g A :K m x K n ^K 
g A (x , Y) = ^y, 


which written out looks like this: 



Our vectors X and 7 are supposed to be column vectors, so that x X is a 
row vector, as shown. Then l XA is a row vector, and l XAY is a 1 x 1 
matrix, i.e. a number. Thus g A maps pairs of vectors into K. Such a 
map g A satisfies properties similar to those of a scalar product. If we fix 
X , then the map Y\-+ t XAY is linear, and if we fix Y, then the map 
X \-± t XAY is also linear. In other words, say fixing X , we have 


g A &, y + n = r) + Y'X 

g A (X, cY ) = c^(X, y), 


and similarly on the other side. This is merely a reformulation of prop¬ 
erties of multiplication of matrices, namely 

x XA(Y + Y') = 'X/4Y + f XylY', 

*X,4(cY) = c l XAY. 
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It is convenient to write out the multiplication 'XAY as a sum. Note 
that 


and thus 


•XA 


m 


Z x t a n ,.. 

i = 1 



n m 


'XAY= X Z x i a ij y j = 

j= 1 i=l 


n m 


Z Z 0ijx t yj- 

j= 1 i= 1 


Example. Let 



If X = and Y= then 


l XAY = x 1 y 1 + 2x^2 + 3x2^! — x 2 ^ 2 - 

Theorem 4.1. Given a bilinear map g: K m x K n K, there exists a 
unique matrix A such that g = g A , i.e. such that 

g(X , Y ) = <XAY 

The set of bilinear maps of K m x K n into K is a vector space , denoted 
by Bil (K m x K n , X), and t/ze association 

gives an isomorphism between Mat mXn (X) and Bil (K m x K n , K). 

Proof We first prove the first statement, concerning the existence of a 
unique matrix A such that g = g A . This statement is similar to the state¬ 
ment representing linear maps by matrices, and its proof is an extension 
of previous proofs. Remember that we used the standard basis for K n to 
prove these previous results, and we used coordinates. We do the same 
here. Let E x ,...,E m be the standard unit vectors for X m , and let 
l/ 1 ,...,!/" be the standard unit vectors for K n . We can then write any 
XeK m as 

m 

X = Z *0 

i = 1 


Y = 


Z yjU J . 


j = i 


and any YeK n as 
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Then 


g{X, Y) = g(x l E i + ... + x m E m , y,U l + • • • + y. C7"). 


Using the linearity on the left, we find 


g(X, Y) = X ^g(E\ yi U l +--- + y n IT). 


i = 1 


Using the linearity on the right, we find 


Let 


g(X, Y) = X Z x iyj g(E\ U+). 

i=i j =i 

a a = g(E\ u j ). 


Then we see that 

m n 

g(x, Y) = Z Z o,jx iyj> 

t=i j =i 

which is precisely the expression we obtained for the product 

t XAY, 

where A is the matrix (a lV ). This proves that g = g A for the choice of 
given above. 

The uniqueness is also easy to see. Suppose that B is a matrix such 
that g = g B . Then for all vectors X , Y we must have 

t XAY= t XBY. 

Subtracting, we find 

l X(A - 0 

for all X , Y. Let C = A — B, so that we can rewrite this last equality as 

l XCY= 0, 

for all X, Y. Let C = (c fi ). We must prove that all c 0 = 0. The above 
equation being true for all X , Y, it is true in particular if we let X = E k 
and Y = U l (the unit vectors!). But then for this choice of X, we find 

0 = t E k CU l = c kl . 

This proves that c kt = 0 for all /c, /, and proves the first statement. 
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The second statement, concerning the isomorphism between the space 
of matrices and bilinear maps will be left as an exercise. See Exercises 3 
and 4. 


V, §4. EXERCISES 

1. Let A be an n x n matrix, and assume that A is symmetric, i.e. A = A. Let 
g A :K n x K n -> K be its associated bilinear map. Show that 


g A (X, Y) = g A (Y, X) 


for all X, YeK n , and thus that g A is a scalar product, i.e. satisfies conditions 
SP 1, SP 2, and SP 3. 

2. Conversely, assume that A is an n x n matrix such that 


9a(X, Y) = g A (Y,X) 


for all X , Y. Show that A is symmetric. 

3. Show that the bilinear maps of K n x K m into K form a vector space. More 
generally, let Bil(C7 x V, W) be the set of bilinear maps of U x V into W. 
Show that Bil (U x V, W) is a vector space. 

4. Show that the association 


A±-+g A 

is an isomorphism between the space of m x n matrices, and the space of bi¬ 
linear maps of K m x K n into K. 

Note : In calculus, if / is a function of n variables, one associates with / a 
matrix of second partial derivatives. 


(—\ 

\dx,dxjJ 

which is symmetric. This matrix represents the second derivative, which is a 
bilinear map. 

5. Write out in full in terms of coordinates the expression for l XAY when A is 
the following matrix, and X , Y are vectors of the corresponding dimension. 
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6. Let 


(f) 





2 

1 

0 


3 

1 

1 


and define g(X , 7) = l XCY. Find two vectors X 9 7eR 3 such that 


g(X, Y)*g(Y,X). 
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V, §5. GENERAL ORTHOGONAL BASES 

Let V be a finite dimensional vector space over the field K , with a scalar 
product. This scalar product need not be positive definite, but there are 
interesting examples of such products nevertheless, even over the real 
numbers. For instance, one may define the product of two vectors 
X = (x l9 x 2 ) and Y = (y l9 y 2 ) to be x l y 1 — x 2 y 2 . Thus 


(x 9 xy = x \-x\. 

Such products arise in many applications, in physics for instance, where 
one deals with a product of vectors in 4-space, such that if 

X = (x, y, z, t\ 


then 


<X, X> = x 2 + y 2 + z 2 - t 2 . 


In this section, we shall see what can be salvaged of the theorems 
concerning orthogonal bases. 

Let V be a finite dimensional vector space over the field K , with a 
scalar product. If IT is a subspace, it is not always true in general that V 
is the direct sum of W and W 1 . This comes from the fact that there 
may be a non-zero vector v in V such that <i>, v} = 0. For instance, over 
the complex numbers, (1, i) is such a vector. The theorem concerning the 
existence of an orthogonal basis is still true, however, and we shall prove 
it by a suitable modification of the arguments given in the preceding sec¬ 
tion. 

We begin by some remarks. First, suppose that for every element u of 
V we have <m, u > = 0. The scalar product is then said to be null, and V 
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is called a null space. The reason for this is that we necessarily have 
<p, w> = 0 for all v , w in V. Indeed, we can write 

(v 9 w) = ^[<i; + w, v + w> — < v , p> — <w, w>]. 

By assumption, the right-hand side of this equation is equal to 0, as one 
sees trivially by expanding out the indicated scalar products. Any basis 
of V is then an orthogonal basis by definition. 

Theorem 5.1. Let V be a finite dimensional vector space over the field 
K , and assume that V has a scalar product. If V ^ {O}, then V has an 
orthogonal basis. 

Proof. We shall prove this by induction on the dimension of V. If V 
has dimension 1, then any non-zero element of V is an orthogonal basis 
of V so our assertion is trivial. 

Assume now that dim V = n > 1. Two cases arise. 

Case 1. For every element ue V, we have <w, u) = 0. Then we already 
observed that any basis of V is an orthogonal basis. 

Case 2. There exists an element v t of V such that (,v l v l y ^0. We 
can then apply the same method that was used in the positive definite 
case, i.e. the Gram-Schmidt orthogonalization. We shall in fact prove 
that if v x is an element of V such that (v l9 v t y ^ 0, and if V l is the 1- 
dimensional space generated by v l9 then V is the direct sum of V 1 and Vfi 
Let veV and let c be as always, 




Then v — cv x lies in Vf and hence the expression 


v = (v — cv 1 ) + cv 1 

shows that V is the sum of V t and Vf This sum is direct, because 
V x n Fj 1 is a subspace of V l9 which cannot be equal to V x (because 
<Pi, v x y 7*^0), and hence must be O because V 1 has dimension 1. Since 
dim Vi < dim V 9 we can now repeat our entire procedure dealing with 
the space of Vf in other words use induction. Thus we find an orthogo¬ 
nal basis of Vf say {v 2 , It follows at once that {v l9 ... 9 v n } is an 

orthogonal basis of V. 

Example 1. In R 2 , let X = (x l9 x 2 ) and Y = (y l9 y 2 ). Define their 
product 


(X 9 Y > = x 1 y l - x 2 y 2 . 
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Then it happens that (1,0) and (0,1) form an orthogonal basis for 
this product also. However, (1,2) and (2,1) form an orthogonal basis 
for this product, but are not an orthogonal basis for the ordinary dot 
product. 

Example 2. Let V be the subspace of R 3 generated by the two vectors 
A = (1, 2, 1) and B = (1, 1, 1). If X = (x u x 2> x 3 ) and Y = (y u y 2 , y 3 ) are 
vectors in R 3 , define their product to be 


(X, Y > = x x y x - x 2 y 2 - x 3 y 3 . 


We wish to find an orthogonal basis of V with respect to this product. 
We note that <v4, A} = \ — 4 — 1 = —4 ^ 0. We let v x = A. We can 
then orthogonalize B , and we let 

= <B 1 A} = 1 
C <A,A> 2 

We let v 2 = B — \A. Then {v u v 2 } is an orthogonal basis of V with re¬ 
spect to the given product. 


V, §5. EXERCISES 

1. Find orthogonal bases of the subspace of R 3 generated by the indicated vec¬ 
tors A, B , with respect to the indicated scalar product, written X • Y. 

(a) A = (1, 1, 1), B = (1, -1,2); 

XY= Xj y t + 2x 2 y 2 + x 3 y 3 

(b) A = ( 1, —1,4), B = (— 1,1,3); 

X' Y = - 3x 2 y 2 + Xjy 3 + y x x 3 - x 3 y 2 - x 2 y 3 

2. Find an orthogonal base for the space C 2 over C, if the scalar product is 
given by X ■ Y = x l y l — ix 2 y x — ix x y 2 — 2x 2 y 2 . 

3. Same question as in Exercise 2, if the scalar product is given by 

X- Y = x t y 2 + x 2 y 1 + 4 x x y v 


V, §6. THE DUAL SPACE AND SCALAR PRODUCTS 

This section merely introduces a name for some notions and properties 
we have already met in greater generality. But the special case to be 
considered is important. 
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Let V be a vector space over the field K. We view K as a one-dimen¬ 
sional vector space over itself. The set of all linear maps of V into K is 
called the dual space, and will be denoted by V*. Thus by definition 


V* = J &(V 9 K). 


Elements of the dual space are usually called functionals. 

Suppose that V is of finite dimension n. Then V is isomorphic to K n . 
In other words, after a basis has been chosen, we can associate to each 
element of V its coordinate vector in K n . Suppose therefore that V = K n . 
By what we saw in Chapter IV, §2 and §3 given a functional 

cp:K n ^K 

there exists a unique element AeK n such that 

<p(X) = A-X for all XeK n . 

Thus cp = L a . We also saw that the association 

A i—* L a 

is a linear map, and therefore this association is an isomorphism 

K n -► V* 

between K n and V*. In particular: 

Theorem 6.1. Let V be a vector space of finite dimension. Then 
dim V* = dim V. 


Example 1. Let V — K n . Let cp \ K n -> K be the projection on the first 
factor, i.e. 

<p(x 

Then cp is a functional. Similarly, for each i= l,...,n we have a func¬ 
tional cpi such that 

<Pi(x u ■ ■ ■ ,x n ) = Xf 

These functionals are just the coordinate functions. 

Let V be finite dimensional of dimension n. Let {v l9 ... 9 v n } be a basis. 
Write each element v in terms of its coordinate vector 


v = x lVl + ••• + x n v n . 
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For each i we let 

< P,:V->K 


be the functional such that 

<Pi(.v t ) = 1 and (pi(Vj) = 0 if i 

Then 

<Pi(v) = x i- 

The functionals {<p u ... ,q> n } form a basis of V*, called the dual basis of 
{l7l, ... ,17*,}. 

Example 2. Let V be a vector space over K , with a scalar product. 
Let v 0 be an element of V. The map 

vh->(v 9 v 0 ), veV, 

is a functional, as follows at once from the definition of a scalar product. 

Example 3. Let V be the vector space of continuous real-valued func¬ 
tions on the interval [0, 1]. We can define a functional L: R by the 

formula 

L(f)= [ f(t)dt 

Jo 

for feV. Standard properties of the integral show that this is a linear 
map. If / 0 is a fixed element of V, then the map 


/»- 


/o(0/(0 dt 

J o 


is also a functional on V. 

Example 4. Let V be as in Example 3. Let <5: K->R be the map such 
that 3(f) = f (0). Then 3 is a functional, called the Dirac functional. 

Example 5. Let V be a vector space over the complex numbers, and 
suppose that V has a hermitian product. Let v 0 be an element of V. The 
map 

v h-> (v, v 0 ), veV, 
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is a functional. However, it is not true that the map v i-> <r 0 , i?> is a 
functional! Indeed, we have for any aeC, 

<y 0 ,ap> = a<p 0 , v}. 

Hence this last map is not linear. It is sometimes called anti-linear or 
semi-linear. 

Let V be a vector space over the field K , and assume given a scalar 
product on V. To each element reFwe can associate a functional L v in 
the dual space, namely the map such that 

L„(w) = <p, w> 

for all we V. If v u v 2 eV, then L Vl + V2 = L VI + L V2 . If ceK then L cv = cL v . 
These relations are essentially a rephrasing of the definition of scalar 
product. We may say that the map 

v\-*L V 

is a linear map of V into the dual space V*. The next theorem is very 
important. 

Theorem 6.2. Let V be a finite dimensional vector space over K with a 
non-degenerate scalar product. Then the map 

v\->L v 

is an isomorphism of V with the dual space V *. 

Proof. We have seen that this map is linear. Suppose L v = 0. This 
means that < v , w) = 0 for all weV. By the definition of non-degenerate, 
this implies that v — 0. Hence the map v i—» L v is injective. Since 
dim V = dim V*, it follows from Theorem 3.3 of Chapter III that this 
map is an isomorphism, as was to be shown. 

In the theorem, we say that the vector v represents the functional L 
with respect to the non-degenerate scalar product. 

Examples. We let V — K n with the usual dot product, 

XY=x l y l + ••• + x n y n , 

which we know is non-degenerate. If 
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is a linear map, then there exists a unique vector AeK n such that for all 
HeK n we have 

(p(H) = AH. 

This allows us to represent the functional <p by the vector A. 

Example from calculus. Let U be an open set in R" and let 

f U-+R 

be a differentiable function. In calculus of several variables, this means 
that for each point leR" there is a function g(H\ defined for small vec¬ 
tors H such that 

lim g(H) = 0, 

H-*0 


and there is a linear map L: R” -> R such that 

f(X + H)=f(X ) + L(H) + \\H\\g(H). 

By the above considerations, there is a unique element /leR n such that 
L — L a , that is 

f(X + H)=f(X ) + A-H+ \\H\\g(H). 

In fact, this vector A is the vector of partial derivatives 

4 _fSf_ df\ 

\cbCi ’ ’dxj 

and A is called the gradient of / at X. Thus the formula can be written 

f(X + H) — f(X) + (grad f)(X)H + \\H\\g(H). 

The vector (grad f)(X) represents the functional L : R” -► R. The function¬ 
al L is usually denoted by f\X\ so we can also write 

f(X + H) = f(X) + f'(X)H + ||if ||0(ff). 

The functional L is also called the derivative of / at X. 


Theorem 6.3. Let V be a vector space of dimension n. Let W be a sub¬ 
space of V and let 


Then 


W 1 = {cpe F* such that <p(W) = 0}. 
dim W + dim W 1 = n. 
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Proof. If W = {O}, the theorem is immediate. Assume W ^ {O}, and 
let be a basis of W. Extend this basis to a basis 

K-»W r , W r+ W„} 

of K Let {<p l5 ...,<p n } be the dual basis. We shall now show that 
{cp r+ (Pn} is a basis of W L . Indeed, (Pj(W) = 0 if ; = r+l,...,n, so 
{(p r+1 ,...,(p n } is a basis of a subspace of fL 1 . Conversely, let cpeW L . 
Write 


<P = + ••• + a n (p n . 


Since cp(W) = 0 we have 

(p( w d = a i = 0 for i = 1,... ,r. 

Hence cp lies in the space generated by (p r+u ... ,<p„. This proves the 
theorem. 

Let V be a vector space of dimension n, with a non-degenerate scalar 
product. We have seen in Theorem 6.2 that the map 

v^L v 

gives an isomorphism of V with its dual space V*. Let W be a subspace 
of V. Then we have two possible orthogonal complements of W\ 

First, we may define 

perp V (W) = {veV such that <p, w) = 0 for all we W}. 

Second, we may define 

perp V *(W) = {<peV* such that cp(W) = 0}. 


The map 


v^L v 


of Theorem 6.2 gives an isomorphism 


perp V (W) ^ perp V *(W). 


Therefore we obtain as a corollary of Theorem 6.3: 
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Theorem 6.4. Let V be a finite dimensional vector space with a non-de¬ 
generate scalar product. Let W be a subspace. Let be the subspace 
of V consisting of all elements veV such that <t;, w> = 0 for all weW. 
Then 


dim W + dim W 1 = dim V. 


This proves Theorem 3.1, which we needed in the study of linear 
equations. For this particular application, we take the scalar product to 
be the ordinary dot product. Thus if IF is a subspace of K n and 


W ± = {XeK n such that AT • 7 = 0 for all Ye W] 


then 


dim W + dim W 1 = n. 


V, §6. EXERCISES 

1. Let A , B be two linearly independent vectors in R". What is the dimension of 
the space perpendicular to both A and B? 

2. Let A, B be two linearly independent vectors in C”. What is the dimension of 
the subspace of C" perpendicular to both A and B? (Perpendicularity refers to 
the ordinary dot product of vectors in C".) 

3. Let W be the subspace of C 3 generated by the vector (1, i, 0). Find a basis of 
W 1 in C 3 (with respect to the ordinary dot product of vectors). 

4. Let V be a vector space of finite dimension n over the field K. Let cp be a 
functional on V, and assume (p ^ 0. What is the dimension of the kernel of 
(pi Proof? 

5. Let V be a vector space of dimension n over the field K. Let \j/, cp be two 
non-zero functionals on V. Assume that there is no element ceK , c ^ 0 such 
that if/ — ccp. Show that 


(Ker cp) n (Ker i j/) 


has dimension n — 2. 

6. Let V be a vector space of dimension n over the field K. Let V** be the dual 
space of V*. Show that each element veV gives rise to an element k v in V** 
and that the map v i—► X v gives an isomorphism of V with V**. 

7. Let V be a finite dimensional vector space over the field K , with a non-degen¬ 
erate scalar product. Let W be a subspace. Show that W 11 = W. 
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V, §7. QUADRATIC FORMS 

A scalar product on a vector space V is also called a symmetric bilinear 
form. The word “symmetric” is used because of condition SP1 in 
Chapter V, §1. The word “bilinear” is used because of condition SP 2 
and SP 3. The word “form” is used because the map 

(v, w) i * <i>, w> 


is scalar valued. Such a scalar product is often denoted by a letter, like 
a function 

g: Vx V^K. 


Thus we write 


g(v , w) = <!?, w>. 


Let V be a finite dimensional space over the field K. Let g = < , ) be 
a scalar product on V. By the quadratic form determined by g , we shall 
mean the function 

/: VK 


such that /(t;) = g(v , t;) = <t>, p). 

Example 1. If V = K n , then /(X) = X • X = x\ + --- + xl is the quad¬ 
ratic form determined by the ordinary dot product. 

In general, if V = K n and C is a symmetric matrix in K , representing 
a symmetric bilinear form, then the quadratic form is given as a function 
of X by 


f{X) = 'XCX = t CijXiXj. 

i,j= 1 


If C is a diagonal matrix, say 



then the quadratic form has a simpler expression, namely 


f(X) = c t x i + ••• + c n xl 
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Let V be again a finite dimensional vector space over the field K. Let 
g be a scalar product, and / its quadratic form. Then we can recover the 
values of g entirely from those of /, because for v, weV , 


<t;, w> = ^[<t; + w, v + w} — (v — w, v — w>] 


or using g, /, 


g(v> w ) = kU(v + w) - f(v - w)]. 
We also have the formula 


<t>, w> = ^[<i> + w, v + w> — <i>, i;> — <w, w>]. 


The proof is easy, expanding out using the bilinearity. For instance, for 
the second formula, we have 

<t> + w, v 4- w) — <t;, t;) — <w, w) 

= <t;, t;) + 2<v, w) + <w, w) — <t;, v ) — <w, w) 

= 2<r, w>. 

We leave the first as an exercise. 

Example 2. Let V = R 2 and let l X = (x, y ) denote elements of R 2 . 
The function / such that 

/(x, j) = lx 2 + 3 xy + y 2 

is a quadratic form. Let us find the matrix of its bilinear symmetric form 
g. We write this matrix 


C = 


a 


and we must have 


f(x, y ) = (x, >’)( 



or in other words 


2x 2 + 3 xy + y 2 = ax 2 + 2 bxy + dy 2 . 
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Thus we obtain a = 2, 2b = 3, and d = 1. The matrix is therefore 



Application with calculus. Let 

/: R" -► R 

be a function which has partial derivatives of order 1 and 2, and such 
that the partial derivatives are continuous functions. Assume that 

f{tX ) = 1 2 f(X ) for all X e R" 

Then / is a quadratic form, that is there exists a symmetric matrix 
A = (< a {j ) such that 


f(X ) = t a^xj. 

ij= 1 

The proof of course takes calculus of several variables. See for in¬ 
stance my own book on the subject. 


V, §7. EXERCISES 

1. Let V be a finite dimensional vector space over a field K. Let f:V-+K be a 
function, and assume that the function g defined by 

g(v , w) = f(v + w) - f(v) - /(w) 

is bilinear. Assume that / (av) = a 2 f (v) for all veV and aeK. Show that / is 
a quadratic form, and determine a bilinear form from which it comes. Show 
that this bilinear form is unique. 

2. What is the associated matrix of the quadratic form 

f(X) = x 2 - 3xy + 4y 2 


if l X = (x, y , z)? 

3. Let x u x 2 , x 3 , x 4 be the coordinates of a vector X, and y l9 y 2 , y 3 , y 4 the 
coordinates of a vector Y. Express in terms of these coordinates the bilinear 
form associated with the following quadratic forms. 

(a) x 1 x 2 (b) x t x 3 -I- x\ (c) 2 x 1 x 2 — * 3*4 (d) x\ — 5x 2 x 3 + x\ 
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4. Show that if f x is the quadratic form of the bilinear form g u and f 2 the quad¬ 
ratic form of the bilinear form g 2 , then f x + f 2 is the quadratic form of the 
bilinear form g x + g 2 . 


V, §8. SYLVESTER’S THEOREM 

Let V be a finite dimensional vector space over the real numbers, of di¬ 
mension > 0. Let < , ) be a scalar product on V. As we know, by 
Theorem 5.1 we can always find an orthogonal basis. Our scalar prod¬ 
uct need not be positive definite, and hence it may happen that there is a 
vector veV such that <i?, v > = 0, or <t?, v) = — 1. 

Example. Let V = R 2 , and let the form be represented by the matrix 


Then the vectors 







and 


v 2 


1 

1 


form an orthogonal basis for the form, and we have 


<i? l5 = —1, as well as (v 2 ,v 2 ) = 0. 

For instance, in term of coordinates, if l X = (1, 1) is the coordinate vec¬ 
tor of say v 2 with respect to the standard basis of R 2 then a trivial direct 
computation shows that 


<x,xy = l xcx = o. 


Our purpose in this section is to analyse the general situation in arbi¬ 
trary dimensions. 

Let {v 1 ,...,v n } be an orthogonal basis of V. Let 

C; = <v h 

After renumbering the elements of our basis if necessary, we may assume 
that {v 1 ,...,v n } are so ordered that: 

Ci,... ,c r ^ 0, 

Cr + 1? • • • 0, 

c s +15 • • • 0, 
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We are interested in the number of positive terms, negative terms, and 
zero terms, among the “squares” (y i9 p f >, in other words, in the numbers 
r and s. We shall see in this section that these numbers do not depend 
on the choice of orthogonal basis. 

If X is the coordinate vector of an element of V with respect to our 
basis, and if / is the quadratic form associated with our scalar product, 
then in terms of the coordinate vector, we have 

f(X) = c t xf + • • • + C r xj + C r+ !X r 2 + !+••• + c s xf. 

We see that in the expression of / in terms of coordinates, there are ex¬ 
actly r positive terms, and s — r negative terms. Furthermore, n — s vari¬ 
ables have disappeared. 

We can see this even more clearly by further normalizing our basis. 
We generalize our notion of orthonormal basis. We define that an or¬ 
thogonal basis {v l9 ... 9 v n } to be orthonormal if for each i we have 

<v i9 v t > = 1 or (y i9 = - 1 or (v i9 v t } = 0. 

If {v l9 ... ,v n } is an orthogonal basis, then we can obtain an orthonor¬ 
mal basis from it just as in the positive definite case. We let c f = (v h v t }. 
If c t = 0, we let 

v[ = v t . 

If Ci > 0, we let 



If c t < 0, we let 



Then {v ' l9 ... ,v' n } is an orthonormal basis. 

Let {v l9 ... 9 v n } be an orthonormal basis of K, for our scalar product. 
If X is the coordinate vector of an element of V , then in terms of our 
orthonormal basis, 

f(X) = x\ + ••• + x 2 r - x 2 r+ i - *1 

By using an orthonormal basis, we see the number of positive and nega¬ 
tive terms particularly clearly. In proving that the number of these does 
not depend on the orthonormal basis, we shall first deal with the number 
of terms which disappear, and we shall give a geometric interpretation 
for it. 
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Theorem 8.1. Let V be a finite dimensional vector space over R, with a 
scalar product. Assume dim V > 0. Let V 0 be the subspace of V consist¬ 
ing of all vectors veV such that <r, w) = 0 for all weV. Let {v u ... 9 v n } 
be an orthogonal basis for V. Then the number of integers i such that 
(v i9 Vi) = 0 is equal to the dimension of V 0 . 

Proof We suppose {v u ... ,v n } so ordered that 

<^i, v t ) / 0, ..., <i? s , r s >#0 but (v i9 Vi) = 0 if i > s. 

Since {i^, ... ,v n } is an orthogonal basis, it is then clear that v s+u ...,v n lie 
in V 0 . Let v be an element of V 0 , and write 


v = x 1 v l + • • • + x s i? s + • • • + x n v n 


with x t e R. Taking the scalar product with any v } for j ^ s, we find 

0 = <i?, = Xj(v p Vj). 

Since (v j9 Vj) # 0, it follows that Xj = 0. Hence v lies in the space gener¬ 
ated by v s+l9 ...,v n . We conclude that v s+l9 ... 9 v n form a basis of V 0 . 

In Theorem 8.1, the dimension of V 0 is called the index of nullity of 
the form. We see that the form is non-degenerate if and only if its index 
of nullity is 0. 

Theorem 8.2 (Sylvester’s theorem). Let V be a finite dimensional vector 
space over R, with a scalar product. There exists an integer r ^ 0 hav¬ 
ing the following property. If {v u ... 9 v n } is an orthogonal basis of V 9 
then there are precisely r integers i such that (v i9 v t ) > 0. 

Proof. Let {v u ... 9 v n } and {w 1 ,...,w„} be orthogonal bases. We sup¬ 
pose their elements so arranged that 


c 

c 

\s 

V 

o 

if 

1 ^ i ^ r. 

\S 

A 

o 

if 

r + 1 ^ i ^ s. 

J3 

\S 

II 

o 

if 

s + 1 ^ i ^ n. 

O f , W f > > 0 

if 

I—^ 

IIA 

IIA 

\s 

A 

o 

if 

r' + 1 ^ i ^ s', 

o 

II 

£ 

£ 

\y 

if 

s' + 1 ^ i ^ n. 


Similarly, 
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We shall first prove that 


v u ...,v r , w r , + w„ 


are linearly independent. 

Suppose we have a relation 


* 1«1 + ••• + x r v r + yr' +1 w r - +1 + ••• + y n w n = 0. 

Then 

XiVi + ••• + x r v r = -(y r , + 1 w r , + 1 +■■■ + y n w n ). 

Let Cf = <t> ( , r,-) and = <w ; , Wj) for all i. Taking the scalar product of 
each side of the preceding equation with itself, we obtain 

Cjxf + ••• + c r x r 2 = d r . + 1 y?. + l + ••• + d s -y 2 . 

The left-hand side is ^ 0. The right-hand side is ^ 0. the only way 
this can hold is that they are both equal to 0, and this holds only if 

x 1 = • • • = x r = 0. 

From the linear independence of w r > + l9 it follows that all coeffi¬ 
cients + are also equal to 0. 

Since dim V = n, we now conclude that 

r + n — r' ^ n 


or in other words, r ^ r\ But the situation holding with respect to our 
two bases is symmetric, and thus r' ^ r. It follows that r' = r, and 
Sylvester’s theorem is proved. 

The integer r of Sylvester’s theorem is called the index of positivity of 
the scalar product. 


V, §8. EXERCISES 

1. Determine the index of nullity and index of positivity for each product deter¬ 
mined by the following symmetric matrices, on R 2 . 
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2. Let V be a finite dimensional space over R, and let < , > be a scalar product 
on V. Show that V admits a direct sum decomposition 

V = © V~ © V 0 , 

where V 0 is defined as in Theorem 6.1, and where the product is positive defi¬ 
nite on K + and negative definite on V~. (This means that 


17> > 0 for all v e V + , v ^ 0 

<17, v} < 0 for all v e V~, v ^ 0.) 

Show that the dimensions of the spaces K + , V~ are the same in all such de¬ 
compositions. 

3. Let V be the vector space over R of 2 x 2 real symmetric matrices. 

(a) Given a symmetric matrix 




(b) 


show that (x, y , z) are the coordinates of A with respect to some basis of 
the vector space of all 2 x 2 symmetric matrices. Which basis? 

Let 

/ (A) = xz — yy = xz — y 2 . 


If we view (x, y, z ) as the coordinates of A then we see that / is a quad¬ 
ratic form on V. Note that f(A) is the determinant of A, which could be 
defined here ad hoc in a simple way. 

Let W be the subspace of V consisting of all A such that tr(X) = 0. 
Show that for AeW and A ^ O we have f(A) < 0. This means that the 
quadratic form is negative definite on W. 



CHAPTER VI 


Determinants 


We have worked with vectors for some time, and we have often felt the 
need of a method to determine when vectors are linearly independent. 
Up to now, the only method available to us was to solve a system of 
linear equations by the elimination method. In this chapter, we shall 
exhibit a very efficient computational method to solve linear equations, 
and determine when vectors are linearly independent. 

The cases of 2 x 2 and 3x3 determinants will be carried out sepa¬ 
rately in full, because the general case of n x n determinants involves no¬ 
tation which adds to the difficulties of understanding determinants. In a 
first reading, we suggest omitting the proofs in the general case. 


VI, §1. DETERMINANTS OF ORDER 2 


Before stating the general properties of an arbitrary determinant, we shall 
consider a special case. 

Let 


A = 




be a 2 x 2 matrix in a field K. We define its determinant to be 
ad — be. Thus the determinant is an element of K. We denote it by 


a b 
c d 


= ad — be. 
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For example, the determinant of the matrix 

(?:) 

is equal to 2 • 4 — 1 • 1 = 7. The determinant of 

(■: i) 

is equal to ( — 2)- 5 — ( — 3)- 4= —10 4- 12 = 2. 

The determinant can be viewed as a function of the matrix A. It can 
also be viewed as a function of its two columns. Let these be A 1 and A 2 
as usual. Then we write the determinant as 

D(A), Det (A), or D(A\ A 2 ). 

The following properties are easily verified by direct computation, 
which you should carry out completely. 

As a function of the column vectors, the determinant is linear. 

This means: let b', d' be two numbers. Then 

^ fa b + b'\ _ fa b\ _ fa b'\ 

“(« J + d) + DC \c d) 

Furthermore, if t is a number, then 


The analogous properties also hold with respect to the first column. 
We give the proof for the additivity with respect to the second column 
to show how easy it is. Namely, we have 

a(d + d') — c(b + b') = ad + ad' — cb — cb' 

= ad — be + ad' — b'c. 


which is precisely the desired additivity. Thus in the terminology of 
Chapter V, §4 we may say that the determinant is bilinear. 

If the two columns are equal, then the determinant is equal to 0. 
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If A is the unit matrix , 



then Det(^) = 1. 

The determinant also satisfies the following additional properties. 

If one adds a multiple of one column to the other , then the value of the 
determinant does not change. 

In other words, let t be a number. The determinant of the matrix 

(a + tb b\ 

\c + td dj 

is the same as D(A\ and similarly when we add a multiple of the first 
column to the second. 


If the two columns are interchanged , then the determinant changes by a 
sign. 

In other words, we have 




The determinant of A is equal to the determinant of its transpose , i.e. 


Explicitly, we have 


D(A) = D( t A). 




The vectors 


'“) and /b ' 


are linearly dependent if and only if the deter¬ 


minant ad — be is equal to 0. 


We give a direct proof for this property. Assume that there exists 
numbers x, y not both 0 such that 

xa + yb = 0, 
xc + yd = 0. 
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Say x^O. Multiply the first equation by d, multiply the second by b, 
and subtract. We obtain 


xad — xbc — 0 , 


whence x(ad — be) = 0. It follows that ad — be = 0. Conversely, assume 
that ad — be — 0 , and assume that not both vectors ( a , c) and ( b , d) are 
the zero vectors (otherwise, they are obviously linearly dependent). Say 
a 7 ^ 0. Let y — —a and x — b. Then we see at once that 

xa + yb — 0 , 
xc + yd — 0 , 

so that (a, c ) and ( b , d) are linearly dependent, thus proving our asser¬ 
tion. 


VI, §2. EXISTENCE OF DETERMINANTS 

We shall define determinants by induction, and give a formula for com¬ 
puting them at the same time. We first deal with the 3x3 case. 

We have already defined 2x2 determinants. Let 


A = (fly) = 


( a 11 a 12 a 13 

a 2 1 a 22 a 23 

a 31 a 32 a 33 


be a 3 x 3 matrix. We define its determinant according to the formula 
known as the expansion by a row, say the first row. That is, we define 


(*) 


Det(T) = 



a 22 

a 23 


a 2 1 

a 23 

+ a 13 

A21 

a 22 

a \ 1 

a 32 

a 33 

~ a 12 

a 31 

a 33 

a 31 

a 32 



11 

a l2 

a 13 

21 

a 22 

a 23 

31 

a 32 

a 33 


We may describe this sum as follows. Let A tj be the matrix obtained 
from A by deleting the i-th row and the j -th column. Then the sum ex¬ 
pressing Det(T) can be written 


flu L>et(T n ) - a 12 Det(T 12 ) + fl 13 Det(T 13 ). 
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In other words, each term consists of the product of an element of the 
first row and the determinant of the 2x2 matrix obtained by deleting 
the first row and the j- th column, and putting the appropriate sign to 
this term as shown. 


Example 1. Let 


A = 



Then 


A ii — 


A 4\ 

2 5 r 


^12 — 


1 4 N 
-3 5. 


^13 — 


i r 

-3 2, 


and our formula for the determinant of A yields 


Det (A) = 2 


1 4 

2 5 


- 1 


1 4 

-3 5 


+ 0 


1 1 

-3 2 


= 2(5 - 8) - 1(5 + 12) + 0 
= -23. 


The determinant of a 3 x 3 matrix can be written as 
D(A) = Det(A) = D(A\ A\ A 3 ). 

We use this last expression if we wish to consider the determinant as a 
function of the columns of A. 

Later we shall define the determinant of an n x n matrix, and we use 
the same notation 


\A\ = D(A) = Det (A) = D(A\... ,/t"). 


Already in the 3 x 3 case we can prove the properties expressed in the 
next theorem, which we state, however, in the general case. 

Theorem 2.1. The determinant satisfies the following properties : 

1. As a function of each column vector , the determinant is linear , i.e. if 
the j-th column A j is equal to a sum of two column vectors , say 
A j = C + C', then 


D(A\...,C + C\...,A n ) 

= D(A\...,C,... ,A n ) + D(A\... ,C\... ,A n ). 
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Furthermore, if t is a number, then 


D(A 1 ,... M J , ■■■,A n ) = tD(A \... ,A\ ... ,A n ). 


2. If two adjacent columns are equal , i.e. if A J = A j+1 for some 
j — 1— 1, then the determinant D(A) is equal to 0. 

3. If I is the unit matrix , then D(I ) = 1. 

Proof (in the 3 x 3 case). The proof is by direct computations. Sup¬ 
pose say that the first column is a sum of two columns: 


A 1 = B + C, 



Substituting in each term of (*), we see that each term splits into a sum 
of two terms corresponding to B and C. For instance, 


a 


42 


11 


a 22 

a 23 

= &i 

a 22 

a 23 

+ C 1 

a 22 

a 23 

a 32 

a 33 


a 32 

«33 


a 32 

a 33 


b 2 + c 2 
b 3 + c 3 


a 23 

a 33 


= a 


12 


b 2 
b3 


a 23 

a 33 


+ a i: 


*23 


*33 


and similarly for the third term. The proof with respect to the other 
column is analogous. Furthermore, if t is a number, then 


Det^ 1 , A 2 , A 3 ) = ta xl 


a 22 

a 32 


a 23 

a 33 


— a !2 


ta 2 i 
ta 31 


a 23 

a 33 


+ a 13 


ta 21 

ta 31 


a 22 

a 32 


= tDet(A\A 2 ,A 3 ) 


because each 2x2 determinant is linear in the first column, and we can 
take t outside each one of the second and third terms. Again the proof 
is similar with respect to the other columns. A direct substitution shows 
that if two adjacent columns are equal, then formula (*) yields 0 for the 
determinant. Finally, one sees at once that if A is the unit matrix, then 
Det(v4) = 1. Thus the three properties are verified. 

In the above proof, we see that the properties of 2 x 2 determinants 
are used to prove the properties of 3 x 3 determinants. 
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Furthermore, there is no particular reason why we selected the expan¬ 
sion according to the first row. We can also use the second row, and 
write a similar sum, namely: 



a 12 

a 13 

+ a 22 

#H 

a 13 


#H 

a 12 

a 2 1 

a 32 

a 33 

a 31 

a 33 

a 23 

a 31 

a 32 



— #2 1 Det(y4 2 i) + #22 D e l( y ^ 22 ) — a 23 Dct(j 4 2 3 ). 


Again, each term is the product of a 2j times the determinant of the 2x2 
matrix obtained by deleting the second row and 7 -th column, and putting 
the appropriate sign in front of each term. This sign is determined ac¬ 
cording to the pattern: 


+ - + 

- + - 

+ - + 


One can see directly that the determinant can be expanded according to 
any row by multiplying out all the terms, and expanding the 2 x 2 deter¬ 
minants, thus obtaining the determinant as an alternating sum of six 
terms: 

(**) Det(y4) = #n# 22 a 33 — a ll a 32 a 23 — ^ 12 ^ 21^33 + ^ 12 ^ 23^31 

+ a 13 a 21 a 32 ~ a 13 a 22 a 3 1 • 

Furthermore, we can also expand according to columns following the 
same principle. For instance, expanding out according to the first 
column: 


a 


li 


a 


22 


a 


23 


a 


32 


a 


33 


— a 


21 


a 12 a 13 

a 32 a 33 


+ # 


31 


a 


12 


a 


13 


a 22 a 23 


yields precisely the same six terms as in (**). 

The reader should now look at least at the general expression given 
for the expansion according to a row or column in Theorem 2.4, inter¬ 
preting i, j to be 1, 2, or 3 for the 3 x 3 case. 

Since the determinant of a 3 x 3 matrix is linear as a function of its 
columns, we may say that it is trilinear; just as a 2 x 2 determinant is 
bilinear. In the n x n case, we would say n-linear, or multilinear. 

In the case of 3 x 3 determinants, we have the following result. 


Theorem 2.2. The determinant satisfies the rule for expansion according 
to rows and columns , and Det(/4) = Det(S4). In other words , the deter¬ 
minant of a matrix is equal to the determinant of its transpose. 
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This last assertion follows because taking the transpose of a matrix 
changes rows into columns and vice versa. 

Example 2. Compute the determinant 

3 0 1 

1 2 5 

-14 2 


by expanding according to the second column. 
The determinant is equal to 


2 




= 2(6 - (- 1 )) - 4(15 - 1) = -42. 


Note that the presence of a 0 in the second column eliminates one term 
in the expansion, since this term would be 0 . 

We can also compute the above determinant by expanding according 
to the third column, namely the determinant is equal to 


+ 1 


1 2 
1 4 


- 5 


3 0 
1 4 


+ 2 


3 0 
1 2 


= -42. 


The n x n case 
Let 


F:K n x ... x K n ^K 


be a function of n variables, where each variable ranges over K n . We say 
that F is multilinear if F satisfies the first property listed in Theorem 2.1, 
that is 

F(A l , ... ,C C',... ,A n ) = F(A 1 i ... ,C,... ,A n ) + F(A 1 ,... ,C',... ,A n ), 

,tC,... A n ) = tF(A 1 ,... ,C,... ,A n \ 

This means that if we consider some index j, and fix A k for k # j, then 
the function X j \-+ F(A 1 ,...,X j , ...,A n ) is linear in the ;-th variable. 

We say that F is alternating if whenever A j = A j+1 for some j we 
have 

F(A\...A\A\...,A n ) = 0. 

This is the second property of determinants. 

One fundamental theorem of this chapter can be formulated as fol¬ 
lows. 
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Theorem 2.3. There exists a multilinear alternating function 

F:K n x ••• x K n ^K 

such that F(I ) = 1. Such a function is uniquely determined by these 

three properties. 

The uniqueness proof will be postponed to Theorem 7.2. We have al¬ 
ready proved existence in case n = 2 and n = 3. We shall now prove the 
existence in general. 

The general case ofnxn determinants is done by induction. Suppose 
that we have been able to define determinants for (n — 1) x (n — 1) 
matrices. Let i, j be a pair of integers between 1 and n. If we cross out 
the i-th row and j- th column in the n x n matrix A, we obtain an 
(n — 1) x (n — 1) matrix, which we denote by Ay. It looks like this: 


'flu 

n 

*** a \n 

Cl 

ij 

: 

\a„i ••• 

dfin J 


We give an expression for the determinant of an n x n matrix in terms 
of determinants of (n — 1) x (n — 1) matrices. Let i be an integer, 
1 ^ i ^ n. We define 


D(A) = (-1 ) i+1 a n Det (A n ) + ••• + (-1 ) i+n a in Det (A in ). 


Each A^ is an (n — 1) x (n — 1) matrix. 

This sum can be described in words. For each element of the i-th 
row, we have a contribution of one term in the sum. This term is equal 
to + or — the product of this element, times the determinant of the 
matrix obtained from A by deleting the i-th row and the corresponding 
column. The sign + or — is determined according to the chess-board 
pattern: 
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This sum is called the expansion of the determinant according to the i-th 
row. We shall prove that this function D satisfies properties 1, 2, and 3. 
Note that D(A) is a sum of the terms 

X ( — 1) ,+Ja u Det(y4 y ) 


as j ranges from 1 to n. 

1. Consider D as a function of the k-th column, and consider any 
term 

( — l) i+Ja ij Det(y4 (J ). 

If j =£ k, then a i} does not depend on the k -th column, and Det (A i} ) 
depends linearly on the Ac-th column. If j = k , then a u depends linearly 
on the k-th column, and Det(^ 0 ) does not depend on the k-th column. 
In any case, our term depends linearly on the Ac-th column. Since D(A) 
is a sum of such terms, it depends linearly on the Ac-th column, and 
property 1 follows. 

2. Suppose two adjacent columns of A are equal, namely A k = A k+1 . 
Let j be an index # k or k + 1. Then the matrix A u has two adjacent 
equal columns, and hence its determinant is equal to 0. Thus the term 
corresponding to an index j ^ k or k + 1 gives a zero contribution to 
D(A). The other two terms can be written 


(~\) i+k a ik Det (A ik ) + (-l) i+k+ \ k+1 Det (A itk+1 ). 

The two matrices A ik and A i k+1 are equal because of our assumption 
that the Ac-th column of A is equal to the (k + l)-th column. Similarly, 
a ik — a i,k+ 1 - Hence these two terms cancel since they occur with opposite 
signs. This proves property 2. 

3. Let A be the unit matrix. Then a tj = 0 unless i = j, in which case 
a a = 1. Each A u is the unit (n— 1) x (n — 1) matrix. The only term in 
the sum which gives a non-zero contribution is 

(-1 ) i+i a u T>et(A u ), 
which is equal to 1. This proves property 3. 

Example 3. We wish to compute the determinant 

1 2 1 
-13 1 . 

0 1 5 
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We use the expansion according to the third row (because it has a zero 
in it), and only two non-zero terms occur: 


(-i) 


1 

1 

~h 5 

i 

2 

-1 

1 

-i 

3 


We can compute explicitly the 2x2 determinants as in §1, and thus we 
get the value 23 for the determinant of our 3x3 matrix. 

It will be shown in a subsequent section that the determinant of a 
matrix A is equal to the determinant of its transpose. When we have 
proved this result, we will obtain: 

Theorem 2.4. Determinants satisfy the rule for expansion according to 
rows and columns. For any column A j of the matrix A = (a i7 ), we have 


D(A ) = + ■■■ + (—l) n+j a nj D(A nj ). 


In practice, the computation of a determinant is often done by using 
an expansion according to some row or column. 


VI, §2. EXERCISES 

1. Let c be a number and let A be a 3 x 3 matrix. Show that 

D(cA) = c 3 D(A). 

2. Let c be a number and let A be an n x n matrix. Show that 

D(cA) = c n D(A). 


VI, §3. ADDITIONAL PROPERTIES OF DETERMINANTS 

To compute determinants efficiently, we need additional properties which 
will be deduced simply from properties 1, 2, 3 of Theorem 2.1. There is 
no change here between the 3x3 and n x n case, so we write n. But 
again, readers may read n = 3 if they wish, the first time around. 

4. Let i, j be integers with 1 ^ i,j ^ n and i ^ j. If the i-th and j-th col¬ 
umns are interchanged , then the determinant changes by a sign. 
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Proof. We prove this first when we interchange the j-th and (j + l)-th 
columns. In the matrix A , we replace the j-th and (j + l)-th columns by 
A j + A j+1 . We obtain a matrix with two equal adjacent columns and by 
property 2 we have: 

0 = D(... ,A j + A j+ \ A j + A j+ \...). 

Expanding out using property 1 repeatedly yields 

0 = D(... ,A j , A J , ...) + D(...,A i+ \ A j ,...) 

+ D(... ,A\ A J+ 1 ,...) + D(... ,A j+1 , A J+i ,...). 

Using property 2, we see that two of these four terms are equal to 0, 
and hence that 


o = D(. .. ,A j+ \ A{ ...) + D(... ,A j , A' + \...). 

In this last sum, one term must be equal to minus the other, as desired. 

Before we prove the property for the interchange of any two columns 
we prove another one. 

5. If two columns A\ A 1 of A are equal , j ^ i, then the determinant of A 
is equal to 0 . 

Proof Assume that two columns of the matrix A are equal. We can 
change the matrix by a successive interchange of adjacent columns until 
we obtain a matrix with equal adjacent columns. (This could be proved 
formally by induction.) Each time that we make such an adjacent inter¬ 
change, the determinant changes by a sign, which does not affect its be¬ 
ing 0 or not. Hence we conclude by property 2 that D(A) = 0 if two 
columns are equal. 

We can now return to the proof of 4 for any i ^ j- Exactly the same 
argument as given in the proof of 4 for j and j + 1 works in the general 
case if we use property 5. We just note that 

0 = D(...,A i + A j ,...,A i + A j ,...) 

and expand as before. This concludes the proof of 4. 

6 . If one adds a scalar multiple of one column to another then the value 
of the determinant does not change. 
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Proof. Consider two distinct columns, say the k -th and 7 -th columns 
A k and A j with k =£ 7 . Let t be a scalar. We add tA j to A k . By property 
1 , the determinant becomes 

D(... ,A k + tA \...) = D(..., A\ ...) + D(... M j , . •.) 

T T T 

k k k 

(the k points to the k -th column). In both terms on the right, the indi¬ 
cated column occurs in the k -th place. But D(... ,A k ,...) is simply D(A). 
Furthermore, 

D(... ,tA j ,...) = tD(...,A\...). 

T T 

k k 

Since k ^ j , the determinant on the right has two equal columns, because 
A j occurs in the k -th place and also in the 7 -th place. Hence it is equal 
to 0. Hence 


D(... ,A k + tA\...) = D(... ,A k , ...), 
thereby proving our property 6 . 

With the above means at our disposal, we can now compute 3 x 3 de¬ 
terminants very efficiently. In doing so, we apply the operations de¬ 
scribed in property 6 , which we now see are valid for rows or columns, 
since Det(A) = Det( f A). We try to make as many entries in the matrix A 
equal to 0. We try especially to make all but one element of a column 
(or row) equal to 0 , and then expand according to that column (or row). 
The expansion will contain only one term, and reduces our computation 
to a 2 x 2 determinant. 

Example 1. Compute the determinant 

3 0 1 

1 2 5 . 

-14 2 

We already have 0 in the first row. We subtract twice the second row 
from the third row. Our determinant is then equal to 

3 0 1 

1 2 5 . 

-3 0 -8 
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We expand according to the second column. The expansion has only 
one term ^ 0, with a + sign, and that is: 



The 2x2 determinant can be evaluated by our definition ad — be, and 
we find 2(-24 - (-3)) = -42. 

Example 2. We wish to compute the determinant 

13 11 

2 15 2 

1-1 2 3 * 

4 1-37 

We add the third row to the second row, and then add the third row to 
the fourth row. This yields 


1 

3 

1 

1 


i 

3 

1 

i 

3 

0 

7 

5 


3 

0 

7 

5 

1 

-1 

2 

3 


1 

-1 

2 

3 

4 

1 

-3 

7 


5 

0 

-1 

10 


We then add three times the third row to the first row and get 

4 0 7 10 

3 0 7 5 

1-1 2 3 ’ 

5 0-1 10 

which we expand according to the second column. There is only one 
term, namely 

4 7 10 

3 7 5 . 

5-1 10 

We subtract twice the second row from the first row, and then from the 
third row, yielding 


-2 

-7 

0 

3 

7 

5 , 

-1 

-15 

0 
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which we expand according to the third column, and get 

-5(30 - 7)= -5(23)= -115. 

VI, §3. EXERCISES 

1. Compute the following determinants. 



2 

1 

2 


3 

-1 

5 


(a) 

0 

3 

-1 

(b) 

-1 

2 

1 

(c) 


4 

1 

1 


-2 

4 

3 



1 

2 

-1 


-1 

5 

3 


(d) 

0 

1 

1 

(e) 

4 

0 

0 

(f) 


0 

2 

7 


2 

7 

8 



2 

-1 

0 

3 

4 

-1 


4 3 
3 0 
2 1 

1 

5 

2 


2. Compute the following determinants. 



1 

1 

-2 

4 


- 

1 

1 

2 

0 



0 

1 

1 

3 

(b) 


0 

3 

2 

1 


(a) 

2 

-1 

1 

0 


0 

4 

1 

2 

(c) 


3 

1 

2 

5 



3 

1 

5 

7 



4 

-9 

2 



4 

— 

1 


1 


(d) 

4 

-9 

2 


(e) 

2 


0 


0 

(f) 


3 

1 

0 



1 


5 


7 



4 

0 

0 



5 

0 

0 




(g) 

0 

1 

0 


(h) 

0 

3 

0 



(i) 


0 

0 

27 



0 

0 

9 





3 1 1 
2 5 5 
8 7 7 


2 0 0 
1 1 0 
5 7 


-1 

1 

2 


3. In general, what is the determinant of a diagonal matrix 


a lt 0 0 ••• 0 

0 a 22 0 ••• 0 

0 0 •. 0 
0 0 0 ••• a MM 


4. Compute the determinant 


cos 6 
sin 6 


— sin 0 
cos 6 


5. (a) Let x l9 x 2 , x 3 be numbers. Show that 


1 

1 

1 


Xj x 

x 2 x 
x 3 X 


2 

1 

2 

2 

2 

3 


= (*2 - *l)(*3 - *lX*3 “ *2>- 
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(b) If xx n are numbers, then show by induction that 


1 x t 
1 x 2 

1 • 


1 

x 1 

v«- 1 

x 2 


j.n — 1 


= n ( x j - x ii 

i<j 


the symbol on the right meaning that it is the product of all terms 
Xj — x ( with i < j and i , j integers from 1 to n. This determinant is called 
the Vandermonde determinant V n . To do the induction easily, multiply 
each column by x 1 and subtract it from the next column on the right, 
starting from the right-hand side. You will find that 

Vn = (X n -X l )-"(X 2 -X l )V n - l . 

6. Find the determinants of the following matrices. 



2 

51 

i r l 

5 

20 \ 

1 

7 

(b) o 

4 

8 

0 

3^ 

r \ 0 

0 

6/ 


(d) 


(f) 


r 1 

98 

54 

0 

2 

46 

\ 0 

0 

-1 

/ 4 

0 

0 

~ 5 

2 

0 

\ 79 

54 

1 

l~ 5 

0 0 


1 7 

2 0 

0 ' 

1-9 

4 1 

0j 

\ 96 

2 3 

1 


00 


(i) Let A be a triangular n x n matrix, say a matrix such that all compo¬ 
nents below the diagonal are equal to 0. 


a 22 

A = 0 0 

0 

What is D(A)7 

7. If a(t\ b(t\ c(t ), d(t) are functions of t, one can form the determinant 



a(t) b(t) 
c(t) d(t ) 
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just as with numbers. Write out in full the determinant 

sin t cos t 
— cos t sin t 

8 . Write out in full the determinant 

t + 1 t — 1 
t 2f + 5 

9. Let /(f), g(t) be two functions having derivatives of all orders. Let (p(t ) be 
the function obtained by taking the determinant 


Show that 


<p(0 = 


v'(t) = 


m g{t) 
m g'(t ) 


fit) g(t) 
/"(*) g"(t) 


i.e. the derivative is obtained by taking the derivative of the bottom row. 

10. Let 


- 4(0 = 


(b i(0 

y>2(0 


^i(0\ 

c 2 (t)J 


be a 2 x 2 matrix of differentiable functions. Let B(t ) and C(t) be its column 
vectors. Let 

(p{t) = Det(^(t)). 

Show that 


<p\t) = D(B'it), C(f)) + D(B(t\ C'(f)). 


11. Let abe distinct numbers, ^ 0. Show that the functions 


e ait ,... ,e ant 


are linearly independent over the complex numbers. [ Hint : Suppose we have 
a linear relation 


c x e ait + • • • + c n e ant = 0 

with constants c h valid for all f. If not all c, are 0, without loss of generality, 
we may assume that none of them is 0. Differentiate the above relation 
n — 1 times. You get a system of linear equations. The determinant of its 
coefficients must be zero. (Why?) Get a contradiction from this.] 
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VI, §4. CRAMER’S RULE 

The properties of the preceding section can be used to prove a well- 
known rule used in solving linear equations. 

Theorem 4.1 (Cramer’s rule). Let A 1 ,...,A n be column vectors such that 

D(A\... 9 A n )* 0. 

Let B be a column vector. If xx n are numbers such that 


*1 

^A 1 +••• 

+ X n A n = 

B, 

= 1 ,... ,n we have 




D(A\ 


X J 

D(A\...,A") 

, 

in the j-th column instead of A j . 


a il 

b, ■■■ 

a ln 


a 2l 

b 2 ■■■ 

a 2n 

x j = 

a nl 

K ••• 

ann 

a i 1 

a Xj ■■■ 

a l n 


a 2l 

a 2j ••• 

a 2n 


a nl 

a nj ■ ■ ■ 

a nn 


(The numerator is obtained from A by replacing the j-th column A j by 
B. The denominator is the determinant of the matrix A.) 

Theorem 4.1 gives us an explicit way of finding the coordinates of B 
with respect to A l ,...,A n . In the language of linear equations, Theorem 
4.1 allows us to solve explicitly in terms of determinants the system of n 
linear equations in n unknowns: 

+ ••• + x n a ln = b 1 
x i a ni + • • • + x „a n „ = b n . 


We now prove Theorem 4.1. 
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Let B be written as in the statement of the theorem, and consider the 
determinant of the matrix obtained by replacing the j -th column of A by 
B. Then 


D(A\...,B,...,A n ) = D(A\...,x l A 1 + ... + x„A n ,...,A"). 

We use property 1 and obtain a sum: 

D(A\ ... , Xl A\ ... ,A n ) + ••• + D(A 1 ,... ,XjA J , ... ,A n ) 

+ ... + D(A\...,x n A\...,A n l 


which by property 1 again, is equal to 

x,D(A\ ... ,A\ ... ,A n ) + • • • + XjD(A\... ,A n ) 

+ ■■■ + x n D{A\...A\...A n \ 

In every term of this sum except the j -th term, two column vectors are 
equal. Hence every term except the j -th term is equal to 0, by property 
5. The j -th term is equal to 


XjD{A\...A n \ 

and is therefore equal to the determinant we started with, namely 
D(A 1 9 ...,B 9 ...,A n ). We can solve for x j9 and obtain precisely the expres¬ 
sion given in the statement of the theorem. 

Example. Solve the system of linear equations: 

3x + 2y + 4z = 1, 

2x — y + z — 0 , 
x + 2y + 3z = 1. 

We have: 


1 

2 

4 


3 

1 

4 


3 

2 

1 

0 

-1 

1 


2 

0 

1 


2 

-1 

0 

1 

2 

3 

, I 7 = 

1 

1 

3 


1 

2 

1 

3 

2 

4 

3 

2 

4 

9 Z — 

3 

2 

4 

2 

-1 

1 


2 

-1 

1 


2 

-1 

1 

1 

2 

3 


1 

2 

3 


1 

2 

3 
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Observe how the column 


B = 



shifts from the first column when solving for x, to the second column 
when solving for y, to the third column when solving for z. The denomi¬ 
nator in all three expressions is the same, namely it is the determinant of 
the matrix of coefficients of the equations. 

We know how to compute 3x3 determinants, and we then find 

* = — i, y = 0, z = f. 

Determinants also allow us to determine when vectors are linearly 
independent. 

Theorem 4.2. Let A 1 ,..., A n be column vectors (of dimension n). If they 

are linearly dependent , then 


D(A\...,A n ) = 0. 

If D(A 1 ,... ,A n ) # 0, then A 1 ,..., A n are linearly independent. 

Proof The second assertion is merely an equivalent formulation of 
the first. It will therefore suffice to prove the first. Assme that A 1 ,...,A n 
are linearly dependent. We can find numbers x l9 . ..,x w not all 0 such 
that 

x x A l + • • • + x n A n = O. 


Suppose Xj =£ 0. Then 


XjA j = - I x k A k . 

k*j 

We note that there is no 7 -th term on the right hand side. Dividing by 
Xj we obtain A j as a linear combination of the vectors A k with k ^ 7 . In 
other words, there are numbers y k (k ^j) such that 

^ = L yk A \ 

k*j 
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namely y k = — x k /xj. By linearity, we get 

D(A \... ,A n ) = D{A\...,Y j y k A\... ,A ") 

k*j 

= £ y k D(A 1 ,...,A k ,...,A n ) 

k*j 

with A k in the j- th column, and k ^ j- In the sum on the right, each de¬ 
terminant has the k- th column equal to the j-th column and is therefore 
equal to 0 by property 5. This proves Theorem 4.2. 

Corollary4.3. If A 1 ,...^" are column vectors of K n such that 
D(A 1 ,...,A n ) ^ 0, and if B is a column vector of K n , then there exist 
numbers x u ...,x n such that 


x x A l + • • • + x n A n = B. 

Proof. According to the theorem, A x ,...,A n are linearly independent, 
and hence form a basis of K n . Hence any vector of K n can be written as 
a linear combination of A 1 ,...,^". 

In terms of linear equations, this corollary shows: 

If a system of n linear equations in n unknowns has a matrix of coeffi¬ 
cients whose determinant is not 0, then this system has a solution , which 
can be determined by Cramer's rule. 

In Theorem 5.3 we shall prove the converse of Corollary 4.3, and so 
we get: 

Theorem 4.4. The determinant D(A 1 9 ...,A n ) is equal to 0 if and only if 
A 1 ,...,A n are linearly dependent. 


VI, §4. EXERCISES 

1. Solve the following systems of linear equations. 

(a) 3x + y — z = 0 
x + y + z = 0 
y — z = 1 


(b) 2x - y + z = 1 
x 3y — 2z = 0 
4x — 3y + z = 2 
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(c) 4x + y + z + w = 1 
x — y + 2z — 3w = 0 
2x + y + 3z + 5w = 0 
x + y — z — w = 2 


(d) x + 2y — 3z + 5w = 0 
2x + y — 4z — w = 1 
x+y+z+w=0 
— x — y — z + w = 4 


VI, §5. TRIANGULATION OF A MATRIX BY COLUMN 
OPERATIONS 

To compute determinants we have used the following two column opera¬ 
tions: 

COL 1. Add a scalar multiple of one column to another. 

COL 2. Interchange two columns. 

We define two matrices A and B (both n x n) to be column equivalent 
if B can be obtained from A by making a succession of column opera¬ 
tions COL 1 and COL 2. Then we have: 

Proposition 5.1. Let A and B be column equivalent. Then 

rank A = rank B ; 

A is invertible if and only if B is invertible ; Det(A) = 0 if and only if 
Det(JB) = 0 . 

Proof. Let A be an n x n matrix. If we interchange two columns of 
A , then the column space, i.e. the space generated by the columns of A , 
is unchanged. Let A x ,...,A n be the columns of A. Let x be a scalar. 
Then the space generated by 


A 1 + xA 2 , A 2 ,...,A" 

is the same as the space generated by A 1 ,...,A n . (Immediate verifica¬ 
tion.) Hence if B is column equivalent to A , it follows that the column 
space of B is equal to the column space of A, so rank A = rank B. 

The determinant changes only by a sign when we make a column 
operation, so Det(A) = 0 if and only if Det (B) = 0. 

Finally, if A is invertible, then rank A = n by Theorem 2.2 of Chapter 
IV, so rank B = n, and so B is invertible by that same theorem. This 
concludes the proof. 
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Theorem 5.2. Let A be an n x n matrix. Then A is column equivalent 
to a triangular matrix 

b lx 0 ••• 0 

b 21 b 22 ••• 0 

b n l bn 2 "■ 

Proof. By induction on n. Let A = (a^). There is nothing to prove if 
n = 1. Let n> 1. If all elements of the first row of A are 0, then we 
conclude the proof by induction by making column operations on the 
(n — 1) x (n — 1 ) matrix 

( a 22 *“ 

Suppose some element of the first row of A is not 0. By column opera¬ 
tions, we can suppose that a 11 ^0. By adding a scalar multiple of the 
first column to each of the other columns, we can then get an equivalent 
matrix B such that 

^12 = * ’ * = = 

that is all elements of the first row are 0 except for a lv We can again 
apply induction to the matrix obtained by deleting the first row and first 
column. This concludes the proof. 

Theorem 5.3. Let A = (A 1 ,... ,A n ) be a square matrix. The following 
conditions are equivalent: 

(a) A is invertible. 

(b) The columns A 1 ,...,A n are linearly independent. 

(c) D(A) ^ 0. 

Proof. That (a) is equivalent to (b) was proved in Theorem 2.2 of 
Chapter IV. By Proposition 5.1 and Theorem 5.2 we may assume that A 
is a triangular matrix. The determinant is then the product of the dia¬ 
gonal elements, and is 0 if and only if some diagonal element is 0. But 
this condition is equivalent to the column vectors being linearly indepen¬ 
dent, thus concluding the proof. 





VI, §5. EXERCISES 

1. (a) Let 1 ^ r, s ^ n and r ^ s. Let J rs be the n x n matrix whose rs-com- 
ponent is 1 and all other components are 0. Let E rs = I + J rs . Show that 
D(E rs ) = 1. 
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(b) Let A be an n x n matrix. What is the effect of multiplying E rs Al of mul¬ 
tiplying AE rs l 

2. In the proof of Theorem 5.3, we used the fact that if A is a triangular matrix, 
then the column vectors are linearly independent if .and only if all diagonal 
elements are ^ 0. Give the details of the proof of this fact. 


VI, §6. PERMUTATIONS 

We shall deal only with permutations of the set of integers {1 , 
which we denote by J n . By definition, a permutation of this set is a map 

a: {1-+ {l,...,n} 

of J n into itself such that, if i, j e J n and i ^ j, then a(i) ^ a(j). Thus a 
permutation is a bijection of J n with itself. If o is such a permutation, 
then the set of integers 

has n distinct elements, and hence consists again of the integers l,...,n in 
a different arrangement. Thus to each integer j e J n there exists a unique 
integer k such that er(/c) = j. We can define the inverse permutation, 
denoted by cr _1 , as the map 


such that a _1 (/c) = unique integer jeJ n such that o(j) = k. If a, x are 
permutations of then we can form their composite map 


a o T, 


and this map will again be a permutation. We shall usually omit the 
small circle, and write ox for the composite map. Thus 

(<tt)(0 = o(x(i)). 

By definition, for any permutation a, we have 

gg ~ 1 = id and g~ 1 g = id, 

where id is the identity permutation, that is, the permutation such that 
id(i) = i for all i = 1 ,... ,n. 
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If <7 ls ... ,(T r are permutations of then the inverse of the composite 
map 

(Ti • • • (T r 

is the permutation 


This is trivially seen by direct multiplication. 

A transposition is a permutation which interchanges two numbers and 
leaves the others fixed. The inverse of a transposition t is obviously 
equal to the transposition t itself, so that t 2 = id. 

Proposition 6.1. Every permutation of J n can be expressed as a product 
of transpositions. 


Proof. We shall prove our assertion by induction on n. For n = 1, 
there is nothing to prove. Let n > 1 and assume the assertion proved for 
n — 1. Let o be a permutation of J n . Let o(n) = k. If k # n let t be the 
transposition of J n such that t(/c) = n , t (n) = k. If k = n, let % = id. Then 
to is a permutation such that 


To(n) = t(/c) = n. 

In other words, to leaves n fixed. We may therefore view to as a permu¬ 
tation of J n - l9 and by induction, there exist transpositions t 1 ,...,t s of 
J n - l9 leaving n fixed, such that 


Tcr = 

We can now write 

a = T _1 t t •••T, = TT 1 •••T s , 

thereby proving our proposition. 

Example 1. A permutation o of the integers {1,...,/!} is denoted by 

r 1 • »i. 

Ui) ••• m 


Thus 



[VI, §6] 


PERMUTATIONS 


165 


denotes the permutation a such that cr(l) = 2, g( 2) = 1, and <j( 3) = 3. 
This permutation is in fact a transposition. If o' is the permutation 

n 2 3- 

|_3 1 2_|’ 

then gg' = go o' is the permutation such that 

= a(a'( 1)) = <t(3) = 3, 

oo’(2) = ff(ff'(2)) = a(l) = 2, 

aa'(3) = a(a'( 3)) = a(2) = 1, 


so that we can write 


= [3 


2 

3 2 


3" 

1 


Furthermore, the inverse of g' is the permutation 



as is immediately determined from the definitions: Since g'{ 1) = 3, we 
must have o-' _1 (3)=l. Since g'{ 2) = 1, we must have a , ~ 1 (l) = 2. 
Finally, since a'(3) = 2, we must have g'~ 1 (2) = 3. 


Example 2. We wish to express the permutation 



as a product of transpositions. Let t be the transposition which inter¬ 
changes 3 and 1, and leaves 2 fixed. Then using the definition, we find 
that 



so that t<7 is a transposition, which we denote by t'. We can then write 
t<7 = t', so that 

G = T~ V = it' 


because t 1 = t. This is the desired product. 
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Example 3. Express the permutation 


12 3 4 
2 3 4 1 


as a product of transpositions. 

Let t 1 be the transposition which interchanges 1 and 2, and leaves 3, 
4 fixed. Then 


x x o = 


"1 

1 


2 3 4 

3 4 2_‘ 


Now let t 2 be the transposition which interchanges 2 and 3, and leaves 
1, 4 fixed. Then 


T 2 Ti<7 



2 3 4 
2 4 3J 


and we see that t 2 t 1 g is a transposition, which we may denote by t 3 . 
Then we get t 2 t 1 <j = t 3 so that 


o — t 1 t 2 t 3 . 

Proposition 6.2. To each permutation a of J n it is possible to assign a 
sign 1 or — 1, denoted by e{o\ satisfying the following conditions : 

(a) If t is a transposition , t/zcn e(r) = — 1. 

(b) // a, a' arc permutations of J„, t/icn 

In fact , if A = (A x ,...,A n ) is an n x n matrix , then efa) can be defined 
by the condition 

D(A a{1 \... ,^ <T(W) ) = e(a)D(A\ ... ,,4"). 

Proof. Observe that (A a(1 \... ,A ff(n) ) is simply a different ordering from 
(A 1 ,...,/!”). Let a be a permutation of J„. Then 

D(A a(1 \... ,A <T(W) ) = ±D(A \... ,A n ), 


and the sign + or — is determined by <j, and does not depend on 
A 1 ,...,A n . Indeed, by making a succession of transpositions, we can 
return (A a{1 \... ,A a{n) ) to the standard ordering (A 1 ,... ,A n \ and each 
transposition changes the determinant by a sign. Thus we may define 


e(o) = 


D(A <T{1 \... 9 A a{n) ) 

D{A\...,A n ) 
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for any choice of A x ,...,A n whose determinant is not 0, say the unit vec¬ 
tors E x ,...,E n . There are of course many ways of applying a succession 
of transpositions to return (A a{1 \... ,y4 ff(B) ) to the standard ordering, but 
since the determinant is a well defined function, it follows that the sign 
e(o) is also well defined, and is the same, no matter which way we select. 
Thus we have 


D(A <T{1 \ ... ,A° (n) ) = e(o)D(A\ ... ,A n ), 

and of course this holds even if D(A 1 ,... ,A n ) = 0 because in this case 
both sides are equal to 0. 

If t is a transposition, then assertion (a) is merely a translation of 
property 4. 

Finally, let o, o' be permutations of J n . Let C j = A a ' U) for j = l,...,n. 
Then on the one hand we have 

(*) D(A a ' a(1 \... ,A a ’ a{n) ) = e(o'o)D(A \... ,A n ), 

and on the other hand, we have 

D(A a <T ^\ ... ,>T' <T( ” ) ) = D{C a(1 \ ... ,C a(n) ) 

= e(o)D(C \... ,C") 

(**) = e(o)e(o')D(A 1 ,...,A n ). 

Let y4 1 ,...,>l" be the unit vectors F 1 ,. From the equality between 

(*) and (**), we conclude that e(er'er) = £(cr')£(er), thus proving our propo¬ 
sition. 

Corollary 6.3. If a permutation o of J n is expressed as a product of 
transpositions , 


a = Ti •••!„ 

where each is a transposition, then s is even or odd according as 
e(o) = 1 or — 1. 


Proof We have 


e (<T) = £(T 1 )--- £ (T s ) = (-l) s , 


whence our assertion is clear. 
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Corollary 6.4. If o is a permutation of J w , then 

e(o) = eO -1 ). 


Proof We have 


1 = e(id) = e(oo *) = e(o)e(o *). 


Hence either e(o) and e(o *) are both equal to 1, or both equal to — 1, 
as desired. 

As a matter of terminology, a permutation is called even if its sign is 
1, and it is called odd if its sign is —1. Thus every transposition is odd. 

Example 4. The sign of the permutation o in Example 2 is equal to 1 
because o = tt'. The sign of the permutation o in Example 3 is equal to 
— 1 because a = 


VI, §6. EXERCISES 

1. Determine the sign of the following permutations. 


(a) 


(d) 


(g) 


G 

G 


2 

3 

2 

3 


1 

3 4 
1 4 


'1 2 3 4~ 

4213 



"1 2 3~| 


"1 2 31 

(b) 

3 1 2j 

(c) 

3 2 lj 


"1 2 3 4“ 


'1 2 3 41 

(e) 

_2 1 4 3 J 

(f) 

3 2 4 ij 


1 2 3 4“ 


n 2 3 41 

00 

3 14 2 

(i) 

-1 

m 

<N 

_1 


2. In each one of the cases of Exercise 1, write the inverse of the permutation. 

3. Show that the number of odd permutations of {1,_,n} for n^.2 is equal to 

the number of even permutations. [Hint: Let t be a transposition. Show that 
the map o > to establishes an injective and surjective map between the even 
and the odd permutations.] 


VI, §7. EXPANSION FORMULA AND UNIQUENESS OF 
DETERMINANTS 

We make some remarks concerning an expansion of determinants. We 
shall generalize the formalism of bilinearity discussed in Chapter V, §4 
and first discuss the 3x3 case. 
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Let X 1 , X 2 9 X 3 be three vectors in K 3 and let (b tJ ) (i 9 j = 1,...,3) be a 
3x3 matrix. Let 


^ = b lt X' + b 21 X 2 + b 31 X 3 = X fr kl X fc , 

*= i 

A 2 = b 12 X 1 + fr 22 X 2 + b 32 X 3 = £ fr* 2 X*, 

/= i 

+ £ b m3 x m . 


m = 1 


Then we can expand using linearity, 


D(A\ A\ A 3 ) = d( £ Z> tl X\ £ b l2 X‘, £ 

\fc=l Z=1 m = 1 / 

= £^ 1 r>f^\ £ z> (2 *', £ fc m3 x mN ) 

k=l \ Z = 1 m=1 / 

= £ £^ 1 b I2 /)fx*,x , ,£b m 3X'" N ] 

k= 1 Z =1 \ m =1 / 


3 3 3 


= L L £ b kl b l2 b m3 D(X\ X', X”). 


k — 1 1=1 m = 1 


Or rewriting just the result, we find the expansion 


^ 2 , A 3 ) = £ £ £ b kl b l2 b m3 D(X\ X 1 , X ") 

k = 1 l = 1 m = 1 


If we wish to get a similar expansion for the n x n case, we must ob¬ 
viously adjust the notation, otherwise we run out of letters /c, /, m. Thus 
instead of using /c, /, m, we observe that these values /c, /, m correspond 
to an arbitrary choice of an integer 1, or 2, or 3 for each one of the 
numbers 1, 2, 3 occurring as the second index in b u . Thus if we let a 
denote such a choice, we can write 

k = cr(l), l = a(2), m = cr(3) 

^kl^2^m3 = 1 ^<t(2), 2^<t(3), 3‘ 


and 
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Thus a: {1, 2, 3} -► {1, 2, 3} is nothing but an association, i.e. a function, 
from J 3 to J 3 , and we can write 


D(A\ A 2 , A 3 ) = X K ilhl b aah2 b ai3h3 D(X^\ X« 2 \ X" <3 >), 

a 


the sum being taken for all such possible a. 

We shall find an expression for the determinant which corresponds to 
the six-term expansion for the 3x3 case. At the same time, observe that 
the properties used in the proof are only properties 1, 2, 3, and their 
consequences 4, 5, 6, so that our proof applies to any function D 
satisfying these properties. 

We first give the argument in the 2x2 case. 

Let 

be a 2 x 2 matrix, and let 

A 1 

be its column vectors. We can write 

A 1 = aE 1 -1- cE 2 and A 2 = bE 1 4- dE 2 , 

where E 1 , E 2 are the unit column vectors. Then 

D(A) = D(A\ A 2 ) = D(aE 1 4- cE 2 , bE 1 4- dE 2 ) 

= abD(E\ E 1 ) 4- cbD(E 2 , E 1 ) 4- adD(E\ E 2 ) 4- cdD{E 2 , E 2 ) 

= -bcD(E\ E 2 ) + adD(E\ E 2 ) 

= ad — be. 



This proves that any function D satisfying the basic properties of a deter¬ 
minant is given by the formula of §1, namely ad — be. 

The proof in general is entirely similar, taking into account the n 
components. It is based on an expansion similar to the one we have just 
used in the 2x2 case. We can formulate it in a lemma, which is a key 
lemma. 
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Lemma7.1. Let be n vectors in n-space. Let B = (by ) be an 

n x n matrix, and let 

A 1 = b^X 1 + ••• + b nl X n 
A” = b ln X l +--- + b nn X". 

Then 

D(A\... ,A”) = X e(a)b a(t) , x • • • b a(n) _ n D(X l ,... 

<7 

w/iere the sum is taken over all permutations o of {1 ,...,n}. 

Proof. We must compute 

J>(*i 1 X 1 + • • • + b Hl X \... fi^X 1 + • • • + 

Using the linearity property with respect to each column, we can express 
this as a sum 


a 

where o(lo(n) denote a choice of an integer between 1 and n for 
each value of 1,... ,n. Thus each o is a mapping of the set of integers 
{l,...,n} into itself, and the sum is taken over all such maps. If some o 
assigns the same integer to distinct values i, j between 1 and n, then the 
determinant on the right has two equal columns, and hence is equal to 0 . 
Consequently we can take our sum only for those a which are such that 
o(i) ± a( j) whenever i ^ j ? namely permutations. By Proposition 6.2 we 
have 

D(X« l \ ... ,X a(n) ) = e(o)D(X\... ,X n ). 

Substituting this for our expressions of D(A 1 i ... ,A n ) obtained above, we 
find the desired expression of the lemma. 

Theorem 7.2. Determinants are uniquely determined by properties 1, 2, 
and 3. Let A = (a fj ). The determinant satisfies the expression 

D(A l ,... ,A n ) = £ £(<7)a ff(1)> x • • • a a( „ hn , 

a 

where the sum is taken over all permutations of the integers {1 ,...,n}. 

Proof. We let X j = E j be the unit vector having 1 in the 7 -th compo¬ 
nent, and we let b ij = a ij in Lemma 7.1. Since by hypothesis we have 
^(E 1 ,...,^") = 1, we see that the formula of Theorem 7.2 drops out at 
once. 
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We obtain further applications of the key Lemma 7.1. Every one of 
the next results will be a direct application of this lemma. 

Theorem 7.3. Let A, B be two n x n matrices. Then 

Det (AB) = Det (A) Det(B). 

The determinant of a product is equal to the product of the deter¬ 
minants. 


Proof. Let A = (a i7 ) and B = ( b jk ): 


fa 


li 


«i B \/&u ••• b lk ■■■ b ln ' 


, a 


n 1 


<W \b nl ••• b nk ■■■ b nn 


Let AB = C, and let C k be the fc-th column of C. Then by definition, 


C k = b lk A' + ... + b nk A". 

Thus 

D(AB) = D(C\ ... ,C n ) 

= DQy^iA 1 + ••• + b nl A n ,... ,b ln A 1 + • • • + b nn A n ). 

= E b °a), 1 • • • K (n) , n D(A aW ,... ,A° in) ) 

a 

= 1 ). 1 • • • K(n ),nD(A\... ,A ") by Lemma 7.1 

a 

= D(B)D(A) by Lemma 7.2. 


This proves the theorem. 

Corollary 7.4. Let A be an invertible n x n matrix. Then 

Det(A~ 1 ) = Det(Ay 1 . 

Proof. We have 1 = D(I) = D(AA _1 ) = D(A)D(A~ 1 ). This proves what 
we wanted. 

Theorem 7.5. Let A be a square matrix. Then Det(A) = Det(S4). 

Proof. In Theorem 7.2, we had 


(*) 


^ a(n),n’ 


Det<>4) = £ eOKd), i • 
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Let g be a permutation of {1 ,...,«}. If o(j) = k, then a 1 (k) = j. We 
can therefore write 

a «U)J ~ a k,a~ Hk)’ 

In a product 


a a(l), 1 * ” a <7(n),n 

each integer k from 1 to n occurs precisely once among the integers 
<j(l),...,<r(w). Hence this product can be written 

x (l) " * * ^n,a~ 

and our sum (*) is equal to 

X e ( G 1 ) a l,«r-^l) * * * a n,ff-Hn)> 

<7 

because e(a) = c(g~ 1 ). In this sum, each term corresponds to a permuta¬ 
tion g. However, as g ranges over all permutations, so does g~ x because 
a permutation determines its inverse uniquely. Hence our sum is equal 
to 

(**) Yj e ( a ) a l,ff(l) ■” a n,a(n)- 

a 

The sum (**) is precisely the sum giving the expanded form of the deter¬ 
minant of the transpose of A. Hence we have proved what we wanted. 


VI, §7. EXERCISES 

1. Show that when n — 3, the expansion of Theorem 7.2 is the six-term expres¬ 
sion given in §2. 

2. Go through the proof of Lemma 7.1 to verify that you did not use all the 
properties of determinants in the proof. You used only the first two proper¬ 
ties. Thus let F be any multilinear, alternating function. As in Lemma 7.1, let 


Then 


A J = £ b 0 X‘ for j = 

i= 1 


F(A',... ,A ”) = X e(<T)b a(l) , t ■ ■ ■ b ainhn F(X\... ,X ”). 

a 


Why can you conclude that if B is the matrix (h y ), then 

F(A\... ,A") = D(B)F(X',... ,X n )l 
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3. Let F:R" x ••• x R"-^R be a function of n variables, each of which ranges 
over R". Assume that F is linear in each variable, and that if A\...,A n e R" 
and if there exists a pair of integers r, s with 1 ^ r, s ^ n such that r ^ s and 
A r = A s then F(A l ,... ,A n ) = 0. Let B l (i = 1,... ,n) be vectors and c i} numbers 
such that 


A> = I CiJ B‘. 


i= 1 


(a) If F(B 1 , ... ,B") = —3 and det(c fj ) = 5, what is F(4\... ,4")? Justify your 
answer by citing appropriate theorems, or proving it. 

(b) If F(F 1 ,... ,£") = 2 (where are the standard unit vectors), and if 

F(A l ,... ,4") = 10, what is D(A l ,... ,A n )l Again give reasons for your 
answer. 


VI, §8. INVERSE OF A MATRIX 

We consider first a special case. Let 


A - 




be a 2 x 2 matrix, and assume that its determinant ad — be ^ 0. We 
wish to find an inverse for A, that is a 2 x 2 matrix 


such that 



AX = XA = I. 


Let us look at the first requirement, AX = /, which written out in full, 
looks like this: 




Let us look at the first column of AX. We must solve the equations 


ax + bz = 1, 
cx + dz = 0. 
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This is a system of two equations in two unknowns, x and z, which we 
know how to solve. Similarly, looking at the second column, we see that 
we must solve a system of two equations in the unknowns y 9 w, namely 


ay + bw = 0, 
cy + dw = 1 . 

Example. Let 



We seek a matrix X such that AX = I. We must therefore solve the 
systems of linear equations 

2 x + z = 1 , 2y + w = 0 , 

and 

4x + 3z = 0, 4y + 3w = 1. 

By the ordinary method of solving two equations in two unknowns, we 
find 

x = f, z=— 2 , and y = — w = 1 . 

Thus the matrix 



is such that AX = I. The reader will also verify by direct multiplication 
that XA = I. This solves for the desired inverse. 


Similarly, in the 3x3 case, we would find three systems of linear 
equations, corresponding to the first column, the second column, and the 
third column. Each system could be solved to yield the inverse. We 
shall now give the general argument. 

Let A be an n x n matrix. If B is a matrix such that AB = / and 
BA = I (1 = unit n x n matrix), then we called B an inverse of A , and we 
write B = A~ l . 

If there exists an inverse of A , then it is unique. 

Proof. Let C be an inverse of A. Then CA = /. Multiplying by B on 
the right, we obtain CAB = B. But CAB = C(AB) = Cl = C. Hence 
C = B. A similar argument works for AC = /. 

A square matrix whose determinant is ^ 0, or equivalently which 
admits an inverse, is called non-singular. 
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Theorem 8.1. Let A = (a tj ) be an n x n matrix , and assume that 
D(A) 7 ^ 0. Then A is invertible. Let E j be the j-th column unit vector , 
and let 


b ij = 


DjA 1 ,... ,E j ,... ,A n ) 
D(A) 


where E j occurs in the i-th place. Then the matrix B = (b^) is an 
inverse for A. 

Proof. Let X = (x fj ) be an unknown n x n matrix. We wish to solve 
for the components x ij9 so that they satisfy AX = I. From the definition 
of products of matrices, this means that for each /, we must solve 

E j = x Xj A 1 H-4- x nj A n . 

This is a system of linear equations, which can be solved uniquely by 
Cramer’s rule, and we obtain 


x u = 


D( A\... ,A*) 

D(A) 


which is the formula given in the theorem. 

We must still prove that XA = I. Note that D(A) ^ 0. Hence by 
what we have already proved, we can find a matrix Y such that AY = I. 
Taking transposes, we obtain l YA = I. Now we have 


/ = l Y{AX)A = l YA(XA) = XA , 


thereby proving what we want, namely that X = B is an inverse for A. 

We can write out the components of the matrix B in Theorem 8.1 as 
follows: 

a il *“ ^ **■ a in 


bij = 


a jl ••• 1 • • • a 


jn 


<*nl 0 ••• 0 7Jn 


Det (A) 


If we expand the determinant in the numerator according to the i-th 
column, then all terms but one are equal to 0 , and hence we obtain the 
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numerator of as a subdeterminant of Det(A). Let A tj be the matrix 
obtained from A be deleting the i-th row and the 7 -th column. Then 

_ ( — iy +J Det(y4 j7 ) 
iJ Det (A) 

(note the reversal of indices!) and thus we have the formula 


A - 1 


= transpose of 


/ (— l) i+j Det(Aij) \ 

V Det(A) / 


VI, §8. EXERCISES 

1. Find the inverses of the matrices in Exercise 1, §3. 

2. Using the fact that if A, B are two n x n matrices then 

Det (AB) = Det(T) Det(fl), 

prove that a matrix A such that Det(T) = 0 does not have an inverse. 

3. Write down explicitly the inverses of the 2x2 matrices: 



4. If A is an n x n matrix whose determinant is ^ 0, and B is a given vector in 
n-space, show that the system of linear equations AX = B has a unique 
solution. If B = O, this solution is X — O. 


VI, §9. THE RANK OF A MATRIX AND 
SUBDETERMINANTS 

Since determinants can be used to test linear independence, they can be 
used to determine the rank of a matrix. 

Example 1. Let 

2 

-1 

0 
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This is a 3 x 4 matrix. Its rank is at most 3. If we can find three 
linearly independent • columns, then we know that its rank is exactly 3. 
But the determinant 

3 1 5 

1 2 2 

1 1 1 

is not equal to 0 (namely, it is equal to -4, as we see by subtracting the 
second column from the first, and then expanding according to the last 
row). Hence rank A = 3. 

It may be that in a 3 x 4 matrix, some determinant of a 3 x 3 subma¬ 
trix is 0, but the 3x4 matrix has rank 3. For instance, let 



The determinant of the first three columns 

3 1 2 

1 2 -1 

4 3 1 

is equal to 0 (in fact, the last row is the sum of the first two rows). 
But the determinant 

1 2 5 

2-1 2 
3 1 1 

is not zero (what is it?) so that again the rank of B is equal to 3. 

If the rank of a 3 x 4 matrix 

/ C 11 C 12 

C = I C 2 i C 2 2 

\ C 3 1 C 32 

is 2 or less, then the determinant of every 3x3 submatrix must be 0, 
otherwise we could argue as above to get three linearly independent col¬ 
umns. We note that there are four such subdeterminants, obtained by 
eliminating successively any one of the four columns. Conversely, if 
every such subdeterminant of every 3x3 submatrix is equal to 0, then it 
is easy to see that the rank is at most 2. Because if the rank were equal 
to 3, then there would be three linearly independent columns, and their 
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determinant would not be 0. Thus we can compute such subdetermin¬ 
ants to get an estimate on the rank, and then use trial and error, and 
some judgment, to get the exact rank. 

Example 2. Let 



If we compute every 3x3 subdeterminant, we shall find 0. Hence the 
rank of C is at most equal to 2. However, the first two rows are 
linearly independent, for instance because the determinant 

3 1 

1 2 

is not equal to 0. It is the determinant of the first two columns of the 
2x4 matrix 



Hence the rank is equal to 2. 

Of course, if we notice that the last row of C is equal to the sum of 
the first two, then we see at once that the rank is ^ 2. 

VI, §9. EXERCISES 


Compute the ranks of the following matrices. 




CHAPTER VII 


Symmetric, Hermitian, and 
Unitary Operators 


Let V be a finite dimensional vector space over the real or complex 
numbers, with a positive definite scalar product. Let 

A: V -> V 


be a linear map. We shall study three important special cases of such 
maps, named in the title of this chapter. Such maps are also represented 
by matrices bearing the same names when a basis of V has been chosen. 

In Chapter VIII we shall study such maps further and show that a 
basis can be chosen such that the maps are represented by diagonal 
matrices. This ties up with the theory of eigenvectors and eigenvalues. 


VII, §1. SYMMETRIC OPERATORS 

Throughout this section we let V be a finite dimensional vector space 
over a field K. We suppose that V has a fixed non-degenerate scalar 
product denoted by <u, w>, for v 9 weV. 

The reader may take V = K n and may fix the scalar product to be the 
ordinary dot product 

<X, Y> = f XY, 


where X , Y are column vectors in K n . However, in applications, it is not 
a good idea to fix such bases right away. 
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A linear map 


A: F- F 

of F into itself will also be called an operator. 

Lemma 1.1. Let A : V-> V be an operator. Then there exists a unique 
operator B: V -► V such that for all v,weV we have 

(Av, w) = (v, Bw). 


Proof. Given weV let 


L: V-*K 

be the map such that L(v) = (Av , w>. Then L is immediately verified to 
be linear, so that L is a functional, L is an element of the dual space V*. 
By Theorem 6.2 of Chapter V there exists a unique element w' eV such 
that for all veV we have 


L(v) = (v, w'>. 

This element w' depends on w (and of course also on A). We denote this 
element w' by Bw. The association 

w i—► Bw 

is a mapping of V into itself. It will now suffice to prove that B is 
linear. Let w 1 ,w 2 gK Then for all veV we get: 

<v, B(w : + w 2 )> = (Av, w x -I- w 2 > = (Av, w } > + (Av, w 2 > 

= (v, Bw t } + (v, Bw 2 } 

= (v, Bw x + Bw 2 y. 

Hence B(w x + w 2 ) and Bw x + Bw 2 represent the same functional and 
therefore are equal. Finally, let ceK. Then 

(v, B(cw)y = (Av, cwy = c(Av, w) 

= c(v, Bwy 
= (v, cBwy. 

Hence B(cw) and cBw represent the same functional, so they are equal. 
This concludes the proof of the lemma. 
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By definition, the operator B in the preceding proof will be called the 
transpose of A and will be denoted by A. The operator A is said to be 
symmetric (with respect to the fixed non-degenerate scalar product < , » 
if l A = A. 

For any operator A of V, we have by definition the formula 


< Av , w> = < v, Aw} 


for all v, weV. If A is symmetric, then (Av, w) = (v. Aw}, and conver¬ 
sely. 

Example 1. Let V — K n and let the scalar product be the ordinary dot 
product. Then we may take A as a matrix in X, and elements of X" 
as column vectors X, Y. Their dot product can be written as a matrix 
multiplication, 

<x, y> = t xY. 

We have 


(AX, T> = \AX)Y = l XAY = (X, AY >, 

where A now means the transpose of the matrix A. Thus when we deal 
with the ordinary dot product of rc-tuples, the transpose of the operator 
is represented by the transpose of the associated matrix. This is the rea¬ 
son why we have used the same notation in both cases. 

The transpose satisfies the following formalism: 

Theorem 1.2. Let V be a finite dimensional vector space over the field 
X, with a non-degenerate scalar product ( , ). Let A, B be operators 
of V, and ceK. Then : 


\A + B) = A + l B, \AB) = l BA, 
\cA) = cA, n A = A. 


Proof. We prove only the second formula. For all v,weV we have 


(ABv, w> = (Bv, Aw ) = (v, f BAw ). 


By definition, this means that \AB) = l BA. The other formulas are just 
as easy to prove. 
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VII, §1. EXERCISES 

1. (a) A matrix A is called skew-symmetric if A = —A. Show that any matrix 

M can be expressed as a sum of a symmetric matrix and a skew-sym¬ 
metric one, and that these latter are uniquely determined. [Hint: Let 
A =i(M + 'M).] 

(b) Prove that if A is skew-symmetric then A 2 is symmetric. 

(c) Let A be skew-symmetric. Show that Det(^) is 0 if A is an n x n matrix 
and n is odd. 

2. Let A be an invertible symmetric matrix. Show that A~ 1 is symmetric. 

3. Show that a triangular symmetric matrix is diagonal. 

4. Show that the diagonal elements of a skew-symmetric matrix are equal to 0. 

5. Let V be a finite dimensional vector space over the field K , with a non¬ 
degenerate scalar product. Let t> 0 , w 0 be elements of V. Let A: V ^ V be the 
linear map such that A(v) = <u 0 , v}w 0 . Describe 54. 

6. Let V be the vector space over R of infinitely differentiable functions vanishing 
outside some interval. Let the scalar product be defined as usual by 

</,<?> = [ f(t)g(t)dt. 

J o 

Let D be the derivative. Show that one can define l D as before, and that 
l D = -D. 


7. Let V be a finite dimensional space over the field X, with a non-degenerate 
scalar product. Let A : V -> V be a linear map. Show that the image of 54 is 
the orthogonal space to the kernel of A. 

8. Let V be a finite dimensional space over R, with a positive definite scalar 
product. Let P: V -> V be a linear map such that PP = P. Assume that 
'PP = P'P . Show that P = t P. 


9. A square n x n real symmetric matrix A is said to be positive definite if 
f XAX > 0 for all X ^ 0. If A, B are symmetric (of the same size) we define 
A < B to mean that B — A is positive definite. Show that if A < B and 
B < C, then A < C. 


10. Let V be a finite dimensional vector space over R, with a positive definite 
scalar product < , >. An operator A of V is said to be semipositive if 
< Av , t>> ^ 0 for all veV , v ^ 0. Suppose that V = W + W 1 is the direct sum 
of a subspace W and its orthogonal complement. Let P be the projection on 
W, and assume W # {0}. Show that P is symmetric and semipositive. 


11. Let the notation be as in Exercise 10. Let c be a real number, and let A be 
the operator such that 


Av = cw 


if we can write v = w 4- w' with w e W and w' e W 1 . Show that A is sym¬ 
metric. 
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12. Let the notation be as in Exercise 10. Let P again be the projection on W. 
Show that there is a symmetric operator A such that A 2 = I + P. 

13. Let A be a real symmetric matrix. Show that there exists a real number c so 
that A + cl is positive. 

14. Let V be a finite dimensional vector space over the field K , with a non¬ 
degenerate scalar product < , >. If A : V -► V is a linear map such that 

4w> = <t», w> 

for all v , weK, show that Det(4) = + 1. [Hint: Suppose first that V = K n 
with the usual scalar product. What then is f AAl What is Det(/4^)?] 

15. Let A, B be symmetric matrices of the same size over the field K. Show that 
AB is symmetric if and only if AB = BA. 


VII, §2. HERMITIAN OPERATORS 

. Throughout this section we let V be a finite dimensional vector 
space over the complex numbers. We supose that V has a fixed positive 
definite hermitian product as defined in Chapter V , §2. We denote this 
product by (v, w> for v,weV. 

A hermitian product is also called a hermitian form. If the readers 
wish, they may take V = C", and they may take the fixed hermitian 
product to be the standard product 


<*, Y> = ‘AT, 


where X , Y are column vectors of C”. 

Let A : V -> V be an operator, i.e. a linear map of V into itself. For 
each we V, the map 


such that 


L W :F->C 
L w (v) = (Av, w) 


for all v e V is a functional. 

Theorem 2.1. Let V be a finite dimensional vector space over C with a 
positive definite hermitian form < , ). Given a functional L on F, there 
exists a unique w'eV such that L(v) = < v,w ') for all veV. 

Proof. The proof is similar to that given in the real case, say 
Theorem 6.2 of Chapter V. We leave it to the reader. 
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From Theorem 2.1, we conclude that given w, there exists a unique w' 
such that 

(Av, w> = (v, w'> 

for all veV. 

Remark. The association w i—► L w is not an isomorphism of V with the 
dual space! In fact, if aeC, then L avv = a L w . However, this is immaterial 
for the existence of the element w'. 

The map w i—► w' of V into itself will be denoted by A*. We sum¬ 
marize the basic property of A* as follows. 

Lemma 2.2. Given an operator A : V -> V there exists a unique operator 
A*: V -> V such that for all v,weV we have 

(Av, w> = (v, A*w >. 

Proof. Similar to the proof of Lemma 1.1. 

The operator A* is called the adjoint of A. Note that A*: V -+ V is 
linear, not anti-linear. No bar appears to spoil the linearity of A*. 

Example. Let V = C" and let the form be the standard form given by 

(x 9 y)^*x?=<x 9 t>, 

for X , Y column vectors of C". Then for any matrix A representing a 
linear map of V into itself, we have 


(AX, 7> = \AX)Y = t X t AY = t X( t AY). 


Furthermore, by definition, the product (AX, Y} is equal to 


(X,A*Y} = l X{A*Y). 

This means that 



We see that it would have been unreasonable to use the same symbol t 
for the adjoint of an operator over C, as for the transpose over R. 

An operator A is called hermitian (or self-adjoint) if A* = A. This 
means that for all v, weV we have 


(Av, w> = (v, Aw ). 
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In view of the preceding example, a square matrix A of complex 
numbers is called hermitian if l A = A , or equivalently, A = A. If A is a 
hermitian matrix, then we can define on C n a hermitian product by the 
rule 

(X, Y)\-^ t (AX)Y. 

(Verify in detail that this map is a hermitian product.) 

The * operation satisfies rules analogous to those of the transpose, 
namely: 

Theorem 2.3. Let V be a finite dimensional vector space over C, with a 
fixed positive definite hermitian form < , >. Let A , B be operators of V , 
and let aeC. Then 

(A + B)* = A* + £*, ( AB )* = B*A *, 

(olA)* = vlA *, A ** = A. 

Proof. We shall prove the third rule, leaving the others to the reader. 
We have for all v , weV: 

< ccAv , w> = ol(Av, w> = a<t>, A*w> = oL4*w>. 

This last expression is also equal by definition to 

<t?, (aX)*w> 

and consequently (a A)* = aX*, as contended. 

We have the polarization identity: 


< A(v + w), v + w> — <A(u — w), v — w) = 2[<.4w, y) + <Ay, w>] 


for all d, wgF, or also 


<^4(i; + w), v + w> — <Ay, i?) — (/4w, w) = <Ay, w> + (Aw, y). 


The verifications of these identities are trivial, just by expanding out the 
left-hand side. 

The next theorem depends essentially on the complex numbers. Its 
analogue would be false over the real numbers. 
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Theorem 2.4. Let V be as before. Let A be an operator such that 
(Av, v} = 0 for all veV. Then A = O. 

Proof. The left-hand side of the polarization identity is equal to 0 for 
all v, weV. Hence we obtain 

(Aw, v ) + (Av, w> = 0 

for all v, weV. Replace v by iv. Then by the rules for the hermitian 
product, we obtain 


whence 


— i(Aw, v ) + i(Av, w> = 0, 
-(Aw, v ) + < Av , w> = 0. 


Adding this to the first relation obtained above yields 


2 (Av, w> = 0, 

whence (Av, w> = 0. Hence A = O, as was to be shown. 

Theorem 2.5. Let V be as before. Let A be an operator. Then A is 
hermitian if and only if (Av,v} is real for all veV. 

Proof. Suppose that A is hermitian. Then 


(Av, v ) = (v, Av ) = (Av, v ). 


Since a complex number equal to its complex conjugate must be a real 
number, we conclude that (Av,v} is real. Conversely, assume that 
(Av, v) is real for all veV. Then 


(Av, v) = (Av, v > = (v, Av > = (A*v, v}. 

Hence ({A — A*)v, v} = 0 for all veV, and by Theorem 2.4, we conclude 
that A — A* = O whence A = A*, as was to be shown. 


VII, §2. EXERCISES 

1. Let A be an invertible hermitian matrix. Show that A~ x is hermitian. 

2. Show that the analogue of Theorem 2.4 when V is a finite dimensional space 
over R is false. In other words, it may happen that Av is perpendicular to v 
for all veV without A being the zero map! 
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3. Show that the analogue of Theorem 2.4 when V is a finite dimensional space 
over R is true if we assume in addition that A is symmetric. 

4. Which of the following matrices are hermitian: 


(a) 






(c) 


1 1 + i 5\ 

2 ' 

5 — i 7/ 


5. Show that the diagonal elements of a hermitian matrix are real. 

6. Show that a triangular hermitian matrix is diagonal. 

7. Let A , B be hermitian matrices (of the same size). Show that A + B is 
hermitian. If AB = BA, show that AB is hermitian. 

8. Let V be a finite dimensional vector space over C, with a positive definite 
hermitian product. Let A : V -> V be a hermitian operator. Show that / + iA 
and / — l 4 are invertible. [Hint: If v ^ O, show that || (J 4- iA)v\\ ^ 0.] 

9. Let A be a hermitian matrix. Show that A and A are hermitian. If A is in¬ 
vertible, show that A~ l is hermitian. 

10. Let V be a finite dimensional space over C, with a positive definite hermitian 
form < , ). Let A: V -> V be a linear map. Show that the following condi¬ 
tions are equivalent: 

(i) We have AA* = A*A. 

(ii) For all veV, \\Av\\ = \\A*v\\ (where \\v\\ = yj(v, v >). 

(iii) We can write A = B + iC, where B, C are hermitian, and BC = CB. 

11. Let A be a non-zero hermitian matrix. Show that tr(^^*) > 0. 


VII, §3. UNITARY OPERATORS 

Let V be a finite dimensional vector space over R, with a positive 
definite scalar product. 

Let A : V -* V be a linear map. We shall say that A is real unitary if 


<. Av , Aw) — <p, w> 


for all v, weV. We may say that A is unitary means that A preserves the 
product. You will find that in the literature, a real unitary map is also 
called an orthogonal map. The reason why we use the terminology 
unitary is given by the next theorem. 
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Theorem 3.1. Let V be as above. Let A : V-> V be a linear map. The 
following conditions on A are equivalent: 

(1) A is unitary. 

(2) A preserves the norm of vectors , i.e. for every veV , we have 

\\Av || = \\v\\. 

(3) For every unit vector veV, the vector Av is also a unit vector. 

Proof. We leave the equivalence between (2) and (3) to the reader. It 
is trivial that (1) implies (2) since the square of the norm < Av , Av ) is a 
special case of a product. Conversely, let us prove that (2) implies (1). 
We have 


(A(v + w), A(v + w)> — <A( v — w), A(v — w)> = 4(Av, Aw}. 


Using the assumption (2), and noting that the left-hand side consists of 
squares of norms, we see that the left-hand side of our equation is equal 
to 

(v + w, v + w> — (v — w, v — w> 


which is also equal to 4<i;, w>. From this our theorem follows at once. 

Theorem 3.1 shows why we called our maps unitary: They are char¬ 
acterized by the fact that they map unit vectors into unit vectors. 

A unitary map U of course preserves perpendicularity, i.e. if v, w are 
perpendicular then Uv, Uw are also perpendicular, for 


<JJv, Uw > = O, w> = 0. 


On the other hand, it does not follow that a map which preserves per¬ 
pendicularity is necessarily unitary. For instance, over the real numbers, 
the map which sends a vector v on 2v preserves perpendicularity but is 
not unitary. Unfortunately, it is standard terminology to call real unitary 
maps orthogonal maps. We emphasize that such maps do more than 
preserve orthogonality: They also preserve norms. 

Theorem 3.2. Let V be a finite dimensional vector space over R, with a 
positive definite scalar product. A linear map A : V -► V is unitary if and 
only if 
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Proof. The operator A is unitary if and only if 

< Av , Aw ) = (v, w) 

for all v, weV. This condition is equivalent with 


(AAv, w) = (v, w) 

for all v,weV, and hence is equivalent with A A = I. 

There remains but to interpret in terms of matrices the condition that 
A be unitary. First we observe that a unitary map is invertible. Indeed, 
if A is unitary and Av = O, then v = 0 because A preserves the norm. 

If we take V = R" in Theorem 3.2, and take the usual dot product as 
the scalar product, then we can represent A by a real matrix. Thus it 
is natural to define a real matrix A to be unitary (or orthogonal) if 
AA = /„, or equivalently, 



Example. The only unitary maps of the plane R 2 into itself are the 
maps whose matrices are of the type 


/cos 6 

— sin 6\ 


/cos 6 

sin 9 \ 

\ sin 6 

cos 61 

or 

1 sin 6 

— cos 6 ) 


If the determinant of such a map is 1 then the matrix representing the 
map with respect to an orthonormal basis is necessarily of the first type, 
and the map is called a rotation. Drawing a picture shows immediately 
that this terminology is justified. A number of statements concerning the 
unitary maps of the plane will be given in the exercises. They are easy 
to work out, and provide good practice which it would be a pity to spoil 
in the text. These exercises are to be partly viewed as providing addi¬ 
tional examples for this section. 

The complex case. As usual, we have analogous notions in the com¬ 
plex case. Let V be a finite dimensional vector space over C, with a posi¬ 
tive definite hermitian product. Let A: V-+ V be a linear map. We define 
A to be complex unitary if 


(Av, Aw ) = < v , w> 




[VII, §3] 


UNITARY OPERATORS 


191 


for all v, weV. The analogue of Theorem 3.1 is true verbatim: The map 
A is unitary if and only if it preserves norms and also if and only if it 
preserves unit vectors. We leave the proof as an exercise. 

Theorem 3.3. Let V be a finite dimensional vector space over C, with a 
positive definite hermitian product. A linear map A: V —► V is unitary if 
and only if 

A*A = I. 

We also leave the proof as an exercise. 

Taking V = C n with the usual hermitian form given by 


< x , Y > = *1^1 + ••• + x n y n9 


we can represent A by a complex matrix. Thus it is natural to define a 
complex matrix A to be unitary if l AA — l n , or 



Theorem 3.4. Let V be a vector space which is either over R with a 
positive definite scalar product, or over C with a positive definite hermi¬ 
tian product. Let 


A : V -> V 


be a linear map. Let {v x ,... ,v n } be an orthonormal basis of V. 

(a) If A is unitary then {Av u ... ,Av n } is an orthonormal basis. 

(b) Let {w 1? ... ,w„} be another orthonormal basis. Suppose that 
Av t = w t for i = 1 ,... ,n. Then A is unitary. 

Proof. The proof is immediate from the definitions and will be left as 
an exercise. See Exercises 1 and 2. 


VII, §3. EXERCISES 

1. (a) Let V be a finite dimensional space over R, with a positive definite scalar 
product. Let {v x ,... ,v n } and {w lv ..,w B } be orthonormal bases. Let 
A: V -> V be an operator of V such that Av t = w t . Show that A is real 
unitary. 

(b) State and prove the analogous result in the complex case. 
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2. Let V be as in Exercise 1. Let {v 1 ,...,v n } be an orthonormal basis of V. Let 
A be a unitary operator of V. Show that {Av 1 ,...,Av„} is an orthonormal 
basis. 

3. Let A be a real unitary matrix. 

(a) Show that A is unitary. 

(b) Show that A~ l exists and is unitary. 

(c) If B is real unitary, show that AB is unitary, and that B~ X AB is unitary. 

4. Let A be a complex unitary matrix. 

(a) Show that A is unitary 

(b) Show that A~ l exists and is unitary. 

(c) If B is complex unitary, show that AB is unitary, and that B~ l AB is 
unitary. 

5. (a) Let V be a finite dimensional space over R, with a positive definite scalar 

product, and let {v l9 ... 9 v n } = & and {w l9 ... 9 w n } = &' be orthonormal 
bases of V. Show that the matrix M%. (id) is real unitary. [Hint: Use 
<w f , w t ) = 1 and <w f , w,) = 0 if i # j, as well as the expression 
w f = Yj a ij v j-> f° r some a fj -eR.] 

(b) Let F: V -> V be such that F(t; f ) = w f for all i. Show that M%. ( F ) is 
unitary. 

6. Show that the absolute value of the determinant of a real unitary matrix is 
equal to 1. Conclude that if A is real unitary, then Det(/1) = 1 or —1. 

7. If A is a complex square matrix, show that Det(^) = Det(^). Conclude that 
the absolute value of the determinant of a complex unitary matrix is equal 
to 1. 

8. Let A be a diagonal real unitary matrix. Show that the diagonal elements of 
A are equal to 1 or — 1. 

9. Let A be a diagonal complex unitary matrix. Show that each diagonal 
element has absolute value 1, and hence is of type e ld , with real 6. 

The following exercises describe various properties of real unitary maps of the 
plane R 2 . 

10. Let V be a 2-dimensional vector space over R, with a positive definite scalar 
product, and let A be a real unitary map of V into itself. Let {v i9 v 2 } and 
{wj, w 2 } be orthonormal bases of v such that Av t = w t for i = 1, 2. Let a , b , 
c , d be real numbers such that 


Wj = av j + bv 2 , 
w 2 = cv t + dv 2 . 


Show that a 2 + b 2 = 1, c 2 + d 2 = 1, ac + bd = 0, a 2 = d 2 and c 2 = b 2 . 

11. Show that the determinant ad — be is equal to 1 or —1. (Show that its 
square is equal to 1.) 
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12. Define a rotation of V to be a real unitary map A of V whose determinant is 
1. Show that the matrix of A relative to an orthogonal basis of V is of type 

a —b 
b a 

for some real numbers a , b such that a 2 + b 2 = 1. Also prove the converse, 
that any linear map of V into itself represented by such a matrix on an 
orthogonal basis is unitary, and has determinant 1. Using calculus, one can 
then conclude that there exist a number 6 such that a = cos 6 and b = sin 6. 

13. Show that there exists a complex unitary matrix U such that, if 

cos 6 —sin 0\ 

and B 

sin 6 cos 6J 




then U~ l AU — B. 

14. Let V = C be viewed as a vector space of dimension 2 over R. Let aeC, 
and let L a : C -► C be the map z i—► az. Show that L a is an R-linear map of V 
into itself. For which complex numbers a is L a a unitary map with respect to 
the scalar product <z, w> = Re(zw)? What is the matrix of L a with respect to 
the basis {1, i} of C overR? 



CHAPTER VIII 


Eigenvectors and 
Eigenvalues 


This chapter gives the basic elementary properties of eigenvectors and 
eigenvalues. We get an application of determinants in computing the 
characteristic polynomial. In §3, we also get an elegant mixture of 
calculus and linear algebra by relating eigenvectors with the problem of 
finding the maximum and minimum of a quadratic function on the 
sphere. Most students taking linear algebra will have had some calculus, 
but the proof using complex numbers instead of the maximum principle 
can be used to get real eigenvalues of a symmetric matrix if the calculus 
has to be avoided. Basic properties of the complex numbers will be 
recalled in an appendix. 


VIII, §1. EIGENVECTORS AND EIGENVALUES 

Let V be a vector space and let 


A : V V 

be a linear map of V into itself. An element v e V is called an eigenvector 
of A if there exists a number A such that Av = Av. If v ^ 0 then A is 
uniquely determined , because A x v = A 2 v implies A 1 = A 2 . In this case, we 
say that A is an eigenvalue of A belonging to the eigenvector v. We also 
say that v is an eigenvector with the eigenvalue A. Instead of eigenvector 
and eigenvalue, one also uses the terms characteristic vector and charac¬ 
teristic value. 

If A is a square n x n matrix then an eigenvector of A is by definition 
an eigenvector of the linear map of K n into itself represented by this 
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matrix. Thus an eigenvector X of A is a (column) vector of K n for 
which there exists XeK such that AX = XX. 


Example 1. Let V be the vector space over R consisting of all infi¬ 
nitely differentiable functions. Let XeR. Then the function / such that 
f(t) = e Xt is an eigenvector of the derivative d/dt because df/dt = Xe Xt . 


Example 2. Let 



be a diagonal matrix. Then every unit vector E l (i = l,...,n) is an eigen¬ 
vector of A. In fact, we have AE l = a i E l \ 



Example 3. If A: V -+ V is a linear map, and v is an eigenvector of A, 
then for any non-zero scalar c, cv is also an eigenvector of A, with the 
same eigenvalue. 


Theorem 1.1. Let V be a vector space and let A : V-> V be a linear 
map. Let XeK. Let V x be the subspace of V generated by all eigenvec¬ 
tors of A having X as eigenvalue. Then every non-zero element of V x is 
an eigenvector of A having X as eigenvalue. 


Proof. Let v u v 2 e V be such that Av 1 = Xv 1 and Av 2 = Xv 2 . Then 


A(v i + v 2 ) = Av 1 4- Av 2 = Xv 1 + Xv 2 = X(v x H- v 2 ). 


If ceK then A(cv x ) = cAv± = cXv x = Xcv 1 . This proves our theorem. 

The subspace V x in Theorem 1.1 is called the eigenspace of A belong¬ 
ing to X. 



196 


EIGENVECTORS AND EIGENVALUES 


[VIII, §1] 


Note. If v l9 v 2 are eigenvectors of A with different eigenvalues A t ^ A 2 
then of course v 1 + v 2 is not an eigenvector of A. In fact, we have the 
following theorem: 

Theorem 1.2. Let V be a vector space and let A: V -> V be a linear 
map. Let v u ... 9 v m be eigenvectors of A, with eigenvalues A l9 ... 9 A m 
respectively. Assume that these eigenvalues are distinct , i.e. 

At # Aj if i # j. 

Then v l9 ... 9 v m are linearly independent. 

Proof. By induction on m. For m= 1, an element v l eV 9 v x ^O is 
linearly independent. Assume m > 1. Suppose that we have a relation 

(*) c l v l + ■■■ + c m v m = O 

with scalars c f . We must prove all c t = 0. We multiply our relation (*) 
by A x to obtain 


c l X l v l + ••• + c m V m = O. 

We also apply A to our relation (*). By linearity, we obtain 

c iVl + ••• + c m^m V m = O. 

We now subtract these last two expressions, and obtain 

C 2 (^2 ~ ^l) V 2 + “ * + C m(^m ~ ^l) V m = 

Since Aj — A x ^ 0 for j = 2,... ,m we conclude by induction that 

c 2 = •• = c m = 0. 

Going back to our original relation, we see that c x v x = O, whence = 0, 
and our theorem is proved. 

Example 4. Let V be the vector space consisting of all differentiable 
functions of a real variable t. Let a l5 ...,a m be distinct numbers. The 
functions 

e ai \... ,e amt 

are eigenvectors of the derivative, with distinct eigenvalues a 1 ,...,a m , and 
hence are linearly independent. 



[VIII, §1] 


EIGENVECTORS AND EIGENVALUES 


197 


Remark 1. In Theorem 1.2, suppose V is a vector space of dimension 
n and A: V-+V is a linear map having n eigenvectors v u ...,v n whose 
eigenvalues A l9 ...,A n are distinct. Then {v u ...,v n } is a basis of V. 

Remark 2. One meets a situation like that of Theorem 1.2 in the 
theory of linear differential equations. Let A = (a tj ) be an n x n matrix, 
and let 


F(t) = 




fn(th 


be a column vector of functions satisfying the equation 


dF 

It 


= AF(t). 


In terms of the coordinates, this means that 


dfi 

dt 


= Z “ijfjdy 


j =i 


Now suppose that A is a diagonal matrix, 


<a x 0 ••• 0 

A = I : 

0 0 •• • a„ 


with a t 7 ^ 0 all i. 


Then each function f,(t) satisfies the equation 


dft 

dt 


ajft). 


By calculus, there exist numbers c u such that for i=l,...,n we 

have 


m = 

[Proof: if df/dt = af(t), then the derivative of f(t)/e at is 0, so f(t)/e at is 
constant.] Conversely, if c u ...,c n are numbers, and we let 


c x e 


a\t ' 


m 




ant 
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Then F(t) satisfies the differential equation 


dF 

~dt 


= AF(t). 


Let V be the set of solutions F(t) for the differential equation 


dF 

dt 


= AF(t). 


Then V is immediately verified to be 
ment shows that the n elements 



a vector space, and the above argu- 



form a basis for V. Furthermore, these elements are eigenvectors of A, 
and also of the derivative (viewed as a linear map). 

The above is valid if A is a diagonal matrix. If A is not diagonal, 
then we try to find a basis such that we can represent the linear map A 
by a diagonal matrix. 

Quite generally, let V be a finite dimensional vector space, and let 

L: V -* V 

be a linear map. Let {v l9 ... ,v n } be a basis of V. We say that this basis 
diagonalizes L if each v t is an eigenvector of L, so Lv t = CiV { with some 
scalar c { . Then the matrix representing L with respect to this basis is the 
diagonal matrix 



We say that the linear map L can be diagonalized if there exists a basis 
of V consisting of eigenvectors. Later in this chapter we show that if A 
is a symmetric matrix and 


L a : R n R” 
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is the associated linear map, then L A can be diagonalized. We say that 
an n x n matrix A can be diagonalized if its associated linear map L A 
can be diagonalized. 


VIII, §1. EXERCISES 

1. Let aeK and a # 0. Prove that the eigenvectors of the matrix 



generate a 1-dimensional space, and give a basis for this space. 

2. Prove that the eigenvectors of the matrix 



generate a 2-dimensional space and give a basis for this space. What are the 
eigenvalues of this matrix? 

3. Let A be a diagonal matrix with diagonal elements a 11 ,...,a nn . What is the 
dimension of the space generated by the eigenvectors of A1 Exhibit a basis 
for the space, and give the eigenvalues. 

4. Let A = (fly) be an n x n matrix such that for each i— we have 


I a u = 0- 

j= 1 

Show that 0 is an eigenvalue of A. 

5. (a) Show that if 9e R, then the matrix 

( cos 9 sin 9\ 
sin 9 —cos 9J 

always has an eigenvector in R 2 , and in fact that there exists a vector v x 
such that Av x = v v [Hint: Let the first component of v x be 

sin 6 

x — - 

1 — cos 9 

if cos 6 # 1. Then solve for y. What if cos 9 — 1?] 

(b) Let v 2 be a vector of R 2 perpendicular to the vector v x found in (a). Show 
that Av 2 = — v 2 . Define this to mean that A is a reflection. 
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6. Let 

cos 6 — sin 9 
sin 6 cos 9 

be the matrix of a rotation. Show that R(9) does not have any real eigen¬ 
values unless R(6) = ± I. [It will be easier to do this exercise after you have 
read the next section.] 

7. Let V be a finite dimensional vector space. Let A , B be linear maps of V into 
itself. Assume that AB = BA. Show that if v is an eigenvector of A, with 
eigenvalue 2, then Bv is an eigenvector of A , with eigenvalue X also if Bv ^ O. 



VIII, §2. THE CHARACTERISTIC POLYNOMIAL 

We shall now see how we can use determinants to find the eigenvalue of 
a matrix. 

Theorem 2.1. Let V be a finite dimensional vector space , and let X be a 
number. Let A : V -> V be a linear map. Then X is an eigenvalue of A if 
and only if A — XI is not invertible. 

Proof. Assume that X is an eigenvalue of A. Then there exists an 
element ve V, v ^ O such that Av = Xv. Hence Av — Xv = O, and 
(A — XI )v = O. Hence A — XI has a non-zero kernel, and A — XI cannot 
be invertible. Conversely, assume that A — XI is not invertible. By 
Theorem 3.3 of Chapter III, we see that A — XI must have a non-zero 
kernel, meaning that there exists an element veV , v ^ O such that 
(A — XI)v = O. Hence Av — Xv = O, and Av = Xv. Thus X is an eigen¬ 
value of A. This proves our theorem. 

Let A be an n x n matrix, A = {a t fi. We define the characteristic poly¬ 
nomial P A to be the determinant 

P A (t) = Det (tl - 


or written out in full, 


t~ flu 


P(t) = 



t — a 


nn 


We can also view A as as linear map from K n to K n , and we also say 
that P A (t) is the characteristic polynomial of this linear map. 
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Example 1. The characteristic polynomial of the matrix 



t- 1 1 -3 

2 t- 1 -1 , 

0 -1 t + 1 


which we expand according to the first column, to find 

p A (t) = t 3 - t 2 - At + 6. 

For an arbitrary matrix A = (a y ), the characteristic polynomial can be 
found by expanding according to the first column, and will always con¬ 
sist of a sum 

(*-0 + 

Each term other than the one we have written down will have degree 
< n. Hence the characteristic polynomial is of type 

P A (t) = t n + terms of lower degree. 

Theorem 2.2. Let A be an n x n matrix. A number X is an eigenvalue 
of A if and only if A is a root of the characteristic polynomial of A. 

Proof. Assume that A is an eigenvalue of A. Then XI — A is not in¬ 
vertible by Theorem 2.1, and hence Det (A/ — A) = 0, by Theorem 5.3 of 
Chapter VI. Consequently A is a root of the characteristic polynomial. 
Conversely, if A is a root of the characteristic polynomial, then 

Det (A/ - A) = 0, 

and hence by the same Theorem 5.3 of Chapter VI we conclude that 
XI — A is not invertible. Hence A is an eigenvalue of A by Theorem 2.1. 

Theorem 2.2 gives us an explicit way of determining the eigenvalues of 
a matrix, provided that we can determine explicitly the roots of its char¬ 
acteristic polynomial. This is sometimes easy, especially in exercies at the 
end of chapters when the matrices are adjusted in such a way that one 
can determine the roots by inspection, or simple devices. It is consider¬ 
ably harder in other cases. 

For instance, to determine the roots of the polynomial in Example 1, 
one would have to develop the theory of cubic polynomials. This can be 
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done, but it involves formulas which are somewhat harder than the for¬ 
mula needed to solve a quadratic equation. One can also find methods 
to determine roots approximately. In any case, the determination of such 
methods belongs to another range of ideas than that studied in the 
present chapter. 


Example 2. Find the eigenvalues and a basis for the eigenspaces of the 
matrix 



The characteristic polynomial is the determinant 


t - 1 -4 

-2 f-3 


= (t - 1 )(t - 3) - 8 = t 2 - 4t - 5 = (t - 5)(t + 1). 


Hence the eigenvalues are 5, —1. 

For any eigenvalue A, a corresponding eigenvector is a vector 
such that 

x + 4y = Ax, 



or equivalently 


2x + 3y = Ay, 
(1 — A)x + 4y = 0, 


2x H- (3 — A)y — 0. 


We give x some value, say x = 1, and solve for y from either equation, 
for instance the second to get y = — 2/(3 — A). This gives us the eigen¬ 
vector 

Substituting A = 5 and A = — 1 gives us the two eigenvectors 

X 1 = ^j^ for A = 5, and X 2 = ^ ^ for A = — 1. 

The eigenspace for 5 has basis X 1 and the eigenspace for — 1 has basis 
X 2 . Note that any non-zero scalar multiples of these vectors would also 
be bases. For instance, instead of AT 2 we could take 


2 

-1 
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Example 3. Find the eigenvalues and a basis for the eigenspaces of the 
matrix 



The characteristic polynomial is the determinant 
(t-2 -1 0\ 

I 0 t-l 1 I = (t - 2 ) 2 (t - 3). 
\ 0 -2 t- 4/ 

Hence the eigenvalues are 2 and 3. 

For the eigenvectors, we must solve the equations 


(2 — X)x + y — 0, 
(1 -X)y-z = 0, 
2y + (4 — X)z = 0. 


Note the coefficient (2 — X) of x. 

Suppose we want to find the eigenspace with eigenvalue >1 = 2. Then 
the first equation becomes y = 0, whence z — 0 from the second equa¬ 
tion. We can give x any value, say x = 1. Then the vector 

Z 1 

X 1 = io 

\o 

is a basis for the eigenspace with eigenvalue 2. 

Now suppose X ^ 2, so X — 3. If we put x = 1 then we can solve for 
y from the first equation to give y — 1, and then we can solve for z in 
the second equation, to get z = — 2. Hence 

Hi 

is a basis for the eigenvectors with eigenvalue 3. Any non-zero scalar 
multiple of X 2 would also be a basis. 
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Example 4. The characteristic polynomial of the matrix 

1 1 2 

0 5-1 

0 0 7 

is ( t — 1 )(t — 5 )(t — 7). Can you generalize this? 

Example 5. Find the eigenvalues and a basis for the eigenspaces of the 
matrix in Example 4. 

The eigenvalues are 1, 5, and 7. Let A" be a non-zero eigenvector, say 

x \ 

y I also written l X = (x, y 9 z ). 

z/ 

Then by definition of an eigenvector, there is a number X such that 
AX = XX, which means 

x + y + 2z = Xx, 

5 y — z = Xy , 

Iz = Xz. 

Case 1. z = 0, y = 0. Since we want a non-zero eigenvector we must 
then have x ^ 0, in which case X = 1 by the first equation. Let X 1 = E 1 
be the first unit vector, or any non-zero scalar multiple to get an eigen¬ 
vector with eigenvalue 1. 

Case 2. z = 0, y ^ 0. By the second equation, we must have X = 5. 
Give y a specific value, say y = 1. Then solve the first equation for x, 
namely 

x + 1 = 5x, which gives x = \. 

Let 



Then A" 2 is an eigenvector with eigenvalue 5. 

Case 3. z ^ 0. Then from the third equation, we must have X = 1. 
Fix some non-zero value of z, say z = 1. Then we are reduced to solving 
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the two simultaneous equations 

x + y + 2 = 7x, 
5y-l = ly. 


This yields y = — \ and x = \. Let 



Then X 3 is an eigenvector with eigenvalue 7. 

Scalar multiples of X 1 , X 2 , X 3 will yield eigenvectors with the same 
eigenvalues as X 1 , X 2 , X 3 respectively. Since these three vectors have 
distinct eigenvalues, they are linearly independent, and so form a basis of 
R 3 . By Exercise 14, there are no other eigenvectors. 

Suppose now that the field of scalars K is the complex numbers. We 
then use the fact proved in an appendix: 

Every non-constant polynomial with complex coefficients has a complex 
root. 

If A is a complex n x n matrix, then the characteristic polynomial of A 
has complex coefficients, and has degree n ^ 1, so has a complex root 
which is an eigenvalue. Thus we have: 

Theorem 2.3. Let A be an n x n matrix with complex components. 
Then A has a non-zero eigenvector and an eigenvalue in the complex 
numbers. 

This is not always true over the real numbers. (Example?) In the next 
section, we shall see an important case when a real matrix always has a 
real eigenvalue. 

Theorem 2.4. Let A , B be two n x n matrices , and assume that B is in¬ 
vertible. Then the characteristic polynomial of A is equal to the charac¬ 
teristic polynomial of B _1 AB. 

Proof. By definition, and properties of the determinant, 

Det(tJ — A) = Det (B~\tl - A)B) = Det(tB _1 B - B'AB) 

= Det (tl - B-'AB). 


This proves what we wanted. 
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Let 


L: V -+V 


be a linear map of a finite dimensional vector space into itself, so L is an 
operator. Select a basis for V and let 


A = M*{L) 


be the matrix associated with L with respect to this basis. We then de¬ 
fine the characteristic polynomial of L to be the characteristic polynomial 
of A. If we change basis, then A changes to B~ 1 AB where B is invert¬ 
ible. By Theorem 2.4, this implies that the characteristic polynomial does 
not depend on the choice of basis. 

Theorem 2.3 can be interpreted for L as stating: 

Let V be a finite dimensional vector space over C of dimension > 0. 

Let L: V -» V be an operator. Then L has a non-zero eigenvector and 

an eigenvalue in the complex numbers. 

We now give examples of computations using complex numbers for 
the eigenvalues and eigenvectors, even though the matrix itself has real 
components. It should be remembered that in the case of complex eigen¬ 
values, the vector space is over the complex numbers, so it consists of 
linear combinations of the given basis elements with complex coefficients. 

Example 6. Find the eigenvalues and a basis for the eigenspaces of the 
matrix 



The characteristic polynomial is the determinant 

= (t - 2X* - 1) + 3 = t 2 - 3t + 5. 

Hence the eigenvalues are 


t-2 1 

- 3 t- 1 


3 + V 9 - 20 


Thus there are two distinct eigenvalues (but no real eigenvalue): 


, _3 + y^n , 3-y^TT 

Aj — and a 2 — ' 


2 


2 
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— G) 


with not both x, y equal to 0. Then X is an eigenvector if 


and only if AX = XX , that is: 


2 x — y = Xx, 

3x + y = Xy, 

where X is an eigenvalue. This system is equivalent with 


(2 - X)x- y = 0, 

3x + (1 - X)y = 0. 

We give x, say, an arbitrary value, for instance x = 1 and solve for y, so 
y = (2 — A) from the first equation. Then we obtain the eigenvectors 

" ki 

Remark. We solved for y from one of the equations. This is con¬ 
sistent with the other because X is an eigenvalue. Indeed, if you substi¬ 
tute x — 1 and y = 2 — 2 on the left in the second equation, you get 

3 + (1 - X)(2 - X) = 0 

because X is a root of the characteristic polynomial. 

Then X(X j) is a basis for the one-dimensional eigenspace of X l9 and 
X(X 2 ) is a basis for the one-dimensional eigenspace of X 2 . 

Example 7. Find the eigenvalues and a basis for the eigenspaces of the 
matrix 



We compute the characteristic polynomial, which is the determinant 

t - 1 - 1 1 

0 t- 1 0 

- 1 0 t- 1 

easily computed to be 

P(t) = (t- 1 )(t 2 -21 + 2). 
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Now we meet the problem of finding the roots of P(t ) as real numbers 
or complex numbers. By the quadratic formula, the roots of t 2 — 2t + 2 
are given by 

The whole theory of linear algebra could have been done over the com¬ 
plex numbers, and the eigenvalues of the given matrix can also be de¬ 
fined over the complex numbers. Then from the computation of the 
roots above, we see that the only real eigenvalue is 1; and that there are 
two complex eigenvalues, namely 


1 + yj — 1 and 
We let these eigenvalues be 


a, = i, a 2 = i + 

Let 

Hi 

be a non-zero vector. Then X is an eigenvector for A if and only if the 
following equations are satisfied with some eigenvalue A: 


i-y^r. 
a 3 =i- y 3 !- 


x + y — z — Ax, 

y = 

X + Z — kz. 

This system is equivalent with 


(1 — A)x + y — z = 0, 
(1 - X)y = 0, 
x + (1 — X)z — 0. 


Case 1. A = 1. Then the second equation will hold for any value of y. 
Let us put y — 1. From the first equation we get z= 1, and from the 
third equation we get x = 0. Hence we get a first eigenvector 
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Case 2. 2^1. Then from the second equation we must have y = 0. 
Now we can solve the system arising from the first and third equations: 


(1 — X)x — z = 0, 
x + (1 — X )z = 0. 


If these equations were independent, then the only solutions would be 
x = z = 0. This cannot be the case, since there must be a non-zero ei¬ 
genvector with the given eigenvalue. Actually you can check directly that 
the second equation is equal to (X — 1) times the first. In any case, we 
give one of the variables an arbitrary value, and solve for the other. For 
instance, let z = 1. Then x = 1/(1 — X). Thus we get the eigenvector 

/l/(l-2)\ 

X(X) = l 0 1 . 

We can substitute X = X t and X = X 2 to get the eigenvectors with the 
eigenvalues X i and X 2 respectively. 

In this way we have found three eigenvectors with distinct eigenvalues, 
namely 

x\ X{X& X(X 2 ). 


Example 8. Find the eigenvalues and a basis for the eigenspaces of the 
matrix 



The characteristic polynomial is 


t- 1 1 

2 t- 1 

- 1 1 



i) 3 - (t -1) -1. 


The eigenvalues are the roots of this cubic equation. In general it is not 
easy to find such roots, and this is the case in the present instance. Let 
u — t — 1. In terms of u the polynomial can be written 
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From arithmetic, the only rational roots must be integers, and must 
divide 1, so the only possible rational roots are + 1, which are not 
roots. Hence there is no rational eigenvalue. But a cubic equation has 
the general shape as shown on the figure: 



Figure 1 


This means that there is at least one real root. If you know calculus, 
then you have tools to be able to determine the relative maximum and 
relative minimum, you will find that the function w 3 — u — 1 has its rela¬ 
tive maximum at u = —1/^/3, and that Q(— 1/^/3) is negative. Hence 
there is only one real root. The other two roots are complex. This is as 
far as we are able to go with the means at hand. In any case, we give 
these roots a name, and let the eigenvalues be 


^i, X 2 , A$. 


They are all distinct. 

We can, however, find the eigenvectors in terms of the eigenvalues. 
Let 



be a non-zero vector. Then X is an eigenvector if and only if AX = XX, 
that is: 


x — y -h 2z = Ax, 
2x + y + 3z = Ay, 
x — y + z = Xz. 
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This system of equations is equivalent with 

(1 — l)x — y + 2z = 0, 

— 2x + (1 — l)y + 3z = 0, 
x — y + (1 — l)z = 0. 

We give z an arbitrary value, say z = 1 and solve for x and y using the 
first two equations. Thus we must solve: 

(1 — l)x + y = 2, 

2x + (1 — 1 )y = 3. 

Multiply the first equation by 2, the second by (1—1) and subtract. 
Then we can solve for y to get 


yW = 


3(1 - 1) - 4 
(1 - l ) 2 - 2 ' 


From the first equation we find 

2 ~ y 

Hence eigenvectors are 

/*Ui)\ fx(X 2 )\ I x(X 3 )\ 

Wi) = I J<Ai) I, X{X 2 ) = I y(X 2 ) I, X(X 2 ) = I y(A 3 ) I, 

where 1 1? 1 2 , 1 3 are the three eigenvalues. This is an explicit answer to 
the extent that you are able to determine these eigenvalues. By machine 
or a computer, you can use means to get approximations to 1 1? 1 2 , 1 3 
which will give you corresponding approximations to the three eigenvec¬ 
tors. Observe that we have found here the complex eigenvectors. Let l x 
be the real eigenvalue (we have seen that there is only one). Then from 
the formulas for the coordinates of X(l), we see that y(l) or x(l) will be 
real if and only if 1 is real. Hence there is only one real eigenvector 
namely X(A. x ). The other two eigenvectors are complex. Each eigen¬ 
vector is a basis for the corresponding eigenspace. 
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VIII, §2. EXERCISES 

1. Let A be a diagonal matrix, 



(a) What is the characteristic polynomial of A? 

(b) What are its eigenvalues? 

2. Let A be a triangular matrix, 



What is the characteristic polynomial of A, and what are its eigenvalues? 

Find the characteristic polynomial, eigenvalues, and bases for the eigenspaces 
of the following matrices. 



5. Find the eigenvalues and eigenvectors of the following matrices. Show that 
the eigenvectors form a 1-dimensional space. 



6. Find the eigenvalues and eigenvectors of the following matrices. Show that 
the eigenvectors form a 1-dimensional space. 

( 1 1 l\ /l 1 0 

0 1 1 (b) 0 1 1 

0 0 1 / \0 0 1 
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7. Find the eigenvalues and a basis for the eigenspaces of the following ma¬ 
trices. 



8. Find the eigenvalues and a basis for the eigenspaces for the following 
matrices. 



( -1 2 2\ /3 2 1\ /— 1 4 —2\ 

2 2 2 ) (e) 0 1 2 I (f) -3 4 0 

-3 -6 -6/ \0 1 -1/ 3 1 3/ 

9. Let V be an n-dimensional vector space and assume that the characteristic 
polynomial of a linear map A : V -> V has n distinct roots. Show that V has 
a basis consisting of eigenvectors of A. 

10. Let A be a square matrix. Show that the eigenvalues of X A are the same as 
those of A. 

11. Let A be an invertible matrix. If X is an eigenvalue of A show that 2^0 
and that 2 _1 is an eigenvalue of A~ l . 

12. Let V be the space generated over R by the two functions sin t and cos t. 
Does the derivative (viewed as a linear map of V into itself) have any non¬ 
zero eigenvectors in VI If so, which? 

13. Let D denote the derivative which we view as a linear map on the space of 
differentiable functions. Let k be an integer =^0. Show that the functions 
sin kx and cos kx are eigenvectors for D 2 . What are the eigenvalues? 

14. Let A: V -> V be a linear map of V into itself, and let {v u ...,v n } be a basis of 
V consisting of eigenvectors having distinct eigenvalues c i9 ... 9 c n . Show that 
any eigenvector v of A in V is a scalar multiple of some v t . 

15. Let A, B be square matrices of the same size. Show that the eigenvalues of 
AB are the same as the eigenvalues of BA. 

VIII, §3. EIGENVALUES AND EIGENVECTORS OF 
SYMMETRIC MATRICES 

We shall give two proofs of the following theorem. 

Theorem 3.1. Let A be a symmetric n x n real matrix. Then there ex¬ 
ists a non-zero real eigenvector for A. 
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The first proof uses the complex numbers. By Theorem 2.3, we know 
that A has an eigenvalue k in C, and an eigenvector Z with complex 
components. It will now suffice to prove: 

Theorem 3.2. Let A be a real symmetric matrix and let k be an eigen¬ 
value in C. Then k is real. IfZ^O is a complex eigenvector with ei¬ 
genvalue k, and Z = X + iY where X, Y eR", then both X, Y are real 
eigenvectors of A with eigenvalue k , and X or Y ^ O. 

Proof. Let Z = t (z u ...,z n ) with complex coordinates z t . Then 

Z Z = Z Z = 'ZZ = z l z l + ••• + z n z n = |z t | 2 + ••• + \z n \ 2 > 0. 

By hypothesis, we have AZ = kZ. Then 

l ZAZ = l ZkZ = k l ZZ. 

The transpose of a 1 x 1 matrix is equal to itself, so we also get 

r Z^Z = 'ZAZ = k l ZZ. 

But AZ = AZ — AZ and ,4Z = kZ = 1Z. Therefore 

k x zz = rzz. 

Since X ZZ ^ 0 it follows that k — I, so k is real. 

Now from AZ = kZ we get 

AX + iAY = kX + ikY , 

and since A, X, Y , are real it follows that AX = kX and AY = kY. This 
proves the theorem. 

Next we shall give a proof using calculus of several variables. 

Define the function 

f(X) = x XAX for XeR n . 

Such a function / is called the quadratic form associated with A. If 
l X = (x u ...,x n ) is written in terms of coordinates, and A = (a tJ ) then 

f(x) = E a ij x i x j- 

i.j= 1 
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Example. Let 


A = 


3 -r 


-1 


Let l X = (x, y ). Then 


'XAX = (x, y)f j = 3x 2 - 2xy + 2y 2 . 


More generally, let 


A = 


a 


b d, 


Then 


(x, y)| 


'a b\/x' 

, b y. 


= ax 2 + 2 bxy + dy : 


Example. Suppose we are given a quadratic expression 

f{x, y) = 3x 2 + 5x_y - Ay 2 . 

Then it is the quadratic form associated with the symmetric matrix 



In many applications, one wants to find a maximum for such a func¬ 
tion / on the unit sphere. Recall that the unit sphere is the set of all 
points X such that \\g\\ = 1, where ||X|| = «Jx X. It is shown in analy¬ 
sis courses that a continuous function / as above necessarily has a maxi¬ 
mum on the sphere. A maximum on the unit sphere is a point P such 
that ||P || = 1 and 

/(P) ^ f(X ) for all X with || X || = 1. 

The next theorem relates this problem with the problem of finding eigen¬ 
vectors. 


Theorem 3.3. Let A be a real symmetric matrix , and let f(X) = l XAX 
be the associated quadratic form. Let P be a point on the unit sphere 
such that f(P) is a maximum for f on the sphere. Then P is an eigen¬ 
vector for A. In other words , there exists a number X such that 
AP = XP. 
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Proof. Let W be the subspace of R" orthogonal to P , that is W — P 1 . 
Then dim W = n — 1. For any element wgW , ||w|| = 1, define the curve 

C(t) = (cos t)P + (sin t)w. 

The directions of unit vectors w e W are the directions tangent to the 
sphere at the point P, as shown on the figure 



The curve C(t) lies on the sphere because ||C(t)|| = 1, as you can verify 
at once by taking the dot product C(t)-C(t), and using the hypothesis 
that Pw = 0. Furthermore, C(0) = P, so C(t) is a curve on the sphere 
passing through P. We also have the derivative 

C'(0 = ( —sin t)P 4- (cos t)w 9 

and so C'(0) = w. Thus the direction of the curve is in the direction of 
w, and is perpendicular to the sphere at P because w • P = 0. Consider 
the function 

&(t) = /(C(0) = C(t)-AC(t). 

Using coordinates, and the rule for the derivative of a product which ap¬ 
plies in this case (as you might know from calculus), you find the deriva¬ 
tive: 

g'(t ) = C'(t)-AC(t) + 

= 2C'(t) • AC(t), 

because A is symmetric. Since /(P) is a maximum and g(0) = /(P), it 
follows that g'(0) = 0. Then we obtain: 

O = g\ 0) = 2C'(0) • AC(0) = 2w • AP. 

Hence AP is perpendicular to W for all weW. But W 1 is the 1-dimen- 
sional space generated by P. Hence there is a number X such that 
AP = X P, thus proving the theorem. 
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Corollary 3.4. The maximum value of f on the unit sphere is equal to 
the largest eigenvalue of A. 

Proof Let X be any eigenvalue and let P be an eigenvector on the 
unit sphere, so ||P|| = 1. Then 


f(P) = f PAP = f PXP = X l PP = X. 


Thus the value of / at an eigenvector on the unit sphere is equal to the 
eigenvalue. Theorem 3.3 tells us that the maximum of / on the unit 
sphere occurs at an eigenvector. Hence the maximum of / on the unit 
sphere is equal to the largest eigenvalue, as asserted. 

Example. Let /(x, y) = lx 2 — 3 xy + y 2 . Let A be the symmetric ma¬ 
trix associated with /. Find the eigenvectors of A on the unit circle, and 
find the maximum of / on the unit circle. 

First we note that / is the quadratic form associated with the matrix 



By Theorem 3.3 a maximum must occur at an eigenvector, so we first 
find the eigenvalues and eigenvectors. 

The characteristic polynomial is the determinant 


t - 2 

3 

2 



Then the eigenvalues are 

3 + yio 


For the eigenvectors, we must solve 

2x — f y = Xx, 

— § x + y = Xy. 

Putting x = 1 this gives the possible eigenvectors 


X(X) = 


1 

1(2 - 2 ) 
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Thus there are two such eigenvectors, up to non-zero scalar multiples. 
The eigenvectors lying on the unit circle are therefore 


—mi “ 


A = and 

2 


A 


->/io 


By Corollary 3.4 the maximum is the point with the bigger eigenvalue, 
and must therefore be the point 


P(X) with X — 


3 + yio 


The maximum value of / on the unit circle is (3 + x /10)/2. 

By the same token, the minimum value of / on the unit circle is 

(3 - yi0)/2. 


VIII, §3. EXERCISES 

1. Find the eigenvalues of the following matrices, and the maximum value of the 
associated quadratic forms on the unit circle. 



2. Same question, except find the maximum on the unit sphere. 



3. Find the maximum and minimum of the function 


f(x, y) = 3x 2 + 5 xy - 4y 2 

on the unit circle. 


VIII, §4. DIAGONALIZATION OF A SYMMETRIC 
LINEAR MAP 

Throughout this section, unless otherwise specified, we let V be a vector 
space of dimension n over R, with a positive definite scalar product. 

We shall give an application of the existence of eigenvectors proved in 
§3. We let 
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be a linear map. Recall that A is symmetric (with respect to the scalar 
product) if we have the relation 


< Av, w) = (v, Aw} 


for all v, weV. 

We can reformulate Theorem 3.1 as follows: 


Theorem 4.1. Let V be a finite dimensional vector space with a positive 
definite scalar product. Let A : V -> V be a symmetric linear map. Then 
A has a nonzero eigenvector. 

Let W be a subspace of V, and let A : V -► V be a symmetric linear map. 
We say that W is stable under A if A(W) a W, that is for all ueW we 
have AueW. Sometimes one also says that W is invariant under A. 

Theorem 4.2. Let A: V -> V be a symmetric linear map. Let v be a 
non-zero eigenvector of A. If w is an element of V , perpendicular to v, 
then Aw is also perpendicular to v. 

If W is a subspace of V which is stable under A, then W 1 is also 
stable under A. 


Proof. Suppose first that v is an eigenvector of A. Then 


(Aw, v ) = <w, Av ) = <w, toy = A<w, v ) = 0. 


Hence Aw is also perpendicular to v. 

Second, suppose W is stable under A. Let ueW 1 . Then for all weW 
we have: 

(Au, w) = < u , Aw ) = 0 


by the assumption that Awe W. Hence Aue W 1 , thus proving the second 
assertion. 

Theorem 4.3 (Spectral theorem). Let V be a finite dimensional vector 
space over the real numbers, of dimension n > 0, and with a positive 
definite scalar product. Let 


A : V -> V 

be a linear map, symmetric with respect to the scalar product. Then V 
has an orthonormal basis consisting of eigenvectors. 

Proof. By Theorem 3.1, there exists a non-zero eigenvector v for A. 
Let W be the one-dimensional space generated by v. Then W is stable 
under A. By Theorem 4.2, W 1 is also stable under A and is a vector 
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space of dimension n — 1. We may then view A as giving a symmetric 
linear map of W 1 into itself. We can then repeat the procedure. We put 
v = v l9 and by induction we can find a basis {v 2 ,... 9 v n } of W 1 consisting 
of eigenvectors. Then 

{»1 v 2 ,..., v h } 

is an orthogonal basis of V consisting of eigenvectors. We divide each 
vector by its norm to get an orthonormal basis, as desired. 

If {e u ...,e n } is an orthonormal basis of V such that each e t is an 
eigenvector, then the matrix of A with respect to this basis is diagonal, 
and the diagonal elements are precisely the eigenvalues: 



In such a simple representation, the effect of A then becomes much 
clearer than when A is represented by a more complicated matrix with 
respect to another basis. 

A basis {v l9 ... 9 v n } such that each v t is an eigenvector for A is called a 
spectral basis for A. We also say that this basis diagonalizes A, because 
the matrix of A with respect to this basis is a diagonal basis. 

Example. We give an application to linear differential equations. Let 
A be an n x n symmetric real matrix. We want to find the solutions in 
R" of the differential equation 


where 


dX(t) 

dt 


AX(t\ 



is given in terms of coordinates which are functions of t, and 


dX(t) 

dt 



Writing this equation in terms of arbitrary coordinates is messy. So let 
us forget at first about coordinates, and view R" as an ^-dimensional 
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vector space with a positive definite scalar product. We choose an or¬ 
thonormal basis of V (usually different from the original basis) consisting 
of eigenvectors of A. Now with respect to this new basis , we can identify 
V with R” with new coordinates which we denote by y u ... 9 y n . With 
respect to these new coordinates, the matrix of the linear map L A is 

( k x 0 ••• 0 \ 

0 2 2 ••• o\ 

0 o ... xj 

where X l9 ... ,A n are the eigenvalues. But in terms of these more conve¬ 
nient coordinates, our differential equation simply reads 


dyi 

dt 




dy, 

dt 


Ky n 


Thus the most general solution is of the form 

yft) = c x e Xit with some constant c { . 

The moral of this example is that one should not select a basis too 
quickly, and one should use as often as possible a notation without 
coordinates, until a choice of coordinates becomes imperative to make 
the solution of a problem simpler. 

Theorem 4.4. Let A be a symmetric real n x n matrix . Then there 
exists an n x n real unitary matrix U such that 


t UAU = U~ 1 AU 


is a diagonal matrix. 

Proof. We view A as the associated matrix of a symmetric linear map 

F: R n —► R” 

relative to the standard basis & = {e\... 9 e n }. By Theorem 4.3 we can 
find an orthonormal basis = {w u ...,w n } of R” such that 


M%(F) 
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is diagonal. Let U = id). Then U 1 AU is diagonal. Furthermore 
U is unitary. Indeed, let U = (c fj ). Then 

n 

w i = X c n e j for * = !,•••,«• 

j= i 

The conditions <w i? w f > = 1 and <w f , w,) = 0 if i / j are immediately 
seen to mean that 

t UU = I that is t U=U~\ 


This proves Theorem 4.4. 

Remark. Theorem 4.4 shows us how to obtain all symmetric real 
matrices. Every symmetric real matrix A can be written in the form 

l UBU , 

where B is a diagonal matrix and U is real unitary. 


VIII, §4. EXERCISES 

1. Suppose that A is a diagonal n x n matrix. For any IgR", what is l XAX in 
terms of the coordinates of X and the diagonal elements of T? 

2. Let 

/lj 0 

0 k 2 
0 0 

be a diagonal matrix with 0. Show that there exists an n x n 

diagonal matrix B such that B 2 = A. 

3. Let V be a finite dimensional vector space with a positive definite scalar 

product. Let A : V -> V be a symmetric linear map. We say that A is positive 

definite if < Av, v) > 0 for all veV and v ^ O. Prove: 

(a) if A is positive definite, then all eigenvalues are > 0. 

(b) If A is positive definite, then there exists a symmetric linear map B such 
that B 2 = A and BA = AB. What are the eigenvalues of 5? [Hint: Use 
a basis of V consisting of eigenvectors.] 

4. We say that A is semipositive if <Tt?, v} ^0 for all veV. Prove the anal¬ 
ogues of (a), (b) of Exercise 3 when A is only assumed semipositive. Thus the 

eigenvalues are ^ 0, and there exists a symmetric linear map B such that 
B 2 = A. 
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5. Assume that A is symmetric positive definite. Show that A 2 and A 1 are 
symmetric positive definite. 

6. Let A: R" -► R" be an invertible linear map. 

(i) Show that l AA is symmetric positive definite. 

(ii) By Exercise 3b, there is a symmetric positive definite B such that 
B 2 = f AA. Let U = AB _1 . Show that U is unitary. 

(iii) Show that A = UB. 

7. Let B be symmetric positive definite and also unitary. Show that B = /. 

8. Prove that a symmetric real matrix A is positive definite if and only if there 
exists a non-singular real matrix N such that A = l NN. [Hint: Use Theorem 
4.4, and write l UAU as the square of a diagonal matrix, say B 2 . Let 
N=UB~ 1 1 

9. Find an orthogonal basis of R 2 consisting of eigenvectors of the given matrix. 



10. Let A be a symmetric 2x2 real matrix. Show that if the eigenvalues of A 
are distinct, then their eigenvectors form an orthogonal basis of R 2 . 

11. Let V be as in §4. Let A: V -> V be a symmetric linear map. Let v lf v 2 be 
eigenvectors of A with eigenvalues X l9 X 2 respectively. If ^ 2 2 , show that 
v x is perpendicular to v 2 . 

12. Let V be as in §4. Let A: V -> V be a symmetric linear map. If A has only 
one eigenvalue, show that every orthogonal basis of V consists of eigenvec¬ 
tors of A. 

13. Let V be as in §4. Let A: V -> V be a symmetric linear map. Let dim V = n, 
and assume that there are n distinct eigenvalues of A. Show that their eigen¬ 
vectors form an orthogonal basis of v. 

14. Let V be as in §4. Let A: V -> V be a symmetric linear map. If the kernel of 
A is {O}, then no eigenvalue of A is equal to 0, and conversely. 

15. Let V be as in §4, and let A: V -► V be a symmetric linear map. Prove that 
the following conditions on A imply each other. 

(a) All eigenvalues of A are >0. 

(b) For all elements veV, v ^ O, we have v ) > 0. 

If the map A satisfies these conditions, it is said to be positive definite. Thus 
the second condition, in terms of coordinate vectors and the ordinary scalar 
product in R” reads: 

(b') For all vectors X e R”, X # O, we have 


{ XAX > 0. 
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16. Determine which of the following matrices are positive definite. 



17. Prove that the following conditions concerning a real symmetric matrix are 
equivalent. A matrix satisfying these conditions is called negative definite. 

(a) All eigenvalues of A are < 0. 

(b) For all vectors leR", X ^ O, we have l XAX < 0. 

18. Let A be an n x n non-singular real symmetric matrix. Prove the following 
statements. 

(a) If k is an eigenvalue of A, then k # 0. 

(b) If k is an eigenvalue of A, then k~ i is an eigenvalue of A 1 . 

(c) The matrices A and A~ l have the same set of eigenvectors. 

19. Let A be a symmetric positive definite real matrix. Show that A~ l exists and 
is positive definite. 

20. Let V be as in §4. Let A and B be two symmetric operators of V such that 
AB = BA. Show that there exists an orthogonal basis of V which consists of 
eigenvectors for both A and B. [Hint: If k is an eigenvalue of A, and V x 
consists of all veV such that Av = kv, show that BV X is contained in V x . 
This reduces the problem to the case when A = 2/.] 

21. Let V be as in §4, and let A: V -> V be a symmetric operator. Let k v ...,k r 
be the distinct eigenvalues of A. If k is an eigenvalue of A, let V X (A) consist 
of the set of all ve V such that Av = kv. 

(a) Show that V X (A) is a subspace of V, and that A maps V X (A) into itself. 
We call V X (A) the eigenspace of A belonging to k. 

(b) Show that V is the direct sum of the spaces 

V = V Xl (A) © • • • © V Xr (A) 

This means that each element veV has a unique expression as a sum 
v = v x 4-+ v r with v t eV Xr 

(c) Let k t , k 2 be two distinct eigenvalues. Show that V Xl is orthogonal to 

V* 

22. If P l , P 2 are two symmetric positive definite real matrices (of the same size), 
and t, u are positive real numbers, show that tP { + uP 2 is symmetric positive 
definite. 

23. Let V be as in §4, and let A: V -► V be a symmetric operator. Let k u ...,k r 
be the distinct eigenvalues of A. Show that 


(A-k 1 I)-(A-k r I) = 0. 
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24. Let V be as in §4, and let A : V -> V be a symmetric operator. A subspace W 
of V is said to be invariant or stable under A if AweW for all weW , i.e. 
A W a W. Prove that if A has no invariant subspace other than 0 and V, 
then A = XI for some number X. [Hint: Show first that A has only one ei¬ 
genvalue.] 

25. (For those who have read Sylvester’s theorem.) Let A: V -> V be a symmetric 
linear map. Referring back to Sylvester’s theorem, show that the index of 
nullity of the form 

(v, w) i—► (Av, w) 

is equal to the dimension of the kernel of A. Show that the index of positi¬ 
vity is equal to the number of eigenvectors in a spectral basis having a posi¬ 
tive eigenvalue. 

VIII, §5. THE HERMITIAN CASE 

Throughout this sections we let V be a finite dimensional vector space 
over C with a positive definite hermitian product. 

That the hermitian case is actually not only analogous but almost the 
same as the real case is already shown by the next result. 

Theorem 5.1. Let A : V -+ V be a hermitian operator. Then every eigen¬ 
value of A is real. 

Proof. Let v be an eigenvector with an eigenvalue X. By Theorem 2.4 
of Chapter VII we know that (Av, v ) is real. Since Av = Xv, we find 


(Av, v ) = X(v, v}. 


But (v, v ) is real >0 by assumption. Hence X is real, thus proving the 
theorem. 

Over C we know that every operator has an eigenvector and an ei¬ 
genvalue. Thus the analogue of Theorem 4.1 is taken care of in the pre¬ 
sent case. We then have the analogues of Theorems 4.2 and 4.3 as 
follows. 

Theorem 5.2. Let A : V -► V be a hermitian operator. Let v be a non¬ 
zero eigenvector of A. If w is an element of V perpendicular to v then 
Aw is also perpendicular to v. 

If W is a subspace of V which is stable under A, then W 1 is also 
stable under A. 


The proof is the same as that of Theorem 4.2. 
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Theorem 5.3 (Spectral theorem). Let A : V -> V be a hermitian linear 
map. Then V has an orthogonal basis consisting of eigenvectors of A. 

Again the proof is the same as that of Theorem 4.3. 

Remark. If {v 1 ,...,v n } is a basis as in the theorem, then the matrix of 
A relative to this basis is a real diagonal matrix. This means that the 
theory of hermitian maps (or matrices) can be handled just like the real 
case. 

Theorem 5.4. Let A be an n x n complex hermitian matrix. Then there 
exists a complex unitary matrix U such that 

U*AU = U~ X AU 


is a diagonal matrix. 

The proof is like that of Theorem 4.4. 


VIII, §5. EXERCISES 

Throughout these exercises, we assume that V is a finite dimensional vector space 
over C, with a positive definite hermitian product. Also, we assume dim V > 0. 
Let A : V -> V be a hermitian operator. We define A to be positive definite if 

(Av, t?> > 0 for all ve V, v # O. 

Also we define A to be semipositive or semidefinite if 

v} ^ 0 forallyeF. 


1. Prove: 

(a) If A is positive definite then all eigenvalues are > 0. 

(b) If A is positive definite, then there exists a hermitian linear map B such 
that B 2 = A and BA = AB. What are the eigenvalues of l 5? [Hint: See 
Exercise 3 of §4.] 

2. Prove the analogues of (a) and (b) in Exercise 1 when A is only assumed to 
be semidefinite. 

3. Assume that A is hermitian positive definite. Show that A 2 and A~ l are her¬ 
mitian positive definite. 

4. Let A : V-> V be an arbitrary invertible operator. Show that there exist a 
complex unitary operator U and a hermitian positive definite operator P 
such that A = UP. [Hint: Let P be a hermitian positive definite operator 
such that P 2 = A*A. Let U = AP~ l . Show that U is unitary.] 
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5. 


Let A be a non-singular complex matrix. Show that A is hermitian positive 
definite if and only if there exists a non-singular matrix N such that 


A = N*N. 


6. Show that the matrix 



is semipositive, and find a square root. 

7. Find a unitary matrix U such that U*AU is diagonal, when A is equal to: 




8. Let A: V-+V be a hermitian operator. Show that there exist semipositive 
operators P l9 P 2 such that A = P l — P 2 . 

9. An operator A: V -> V is said to be normal if AA* = A*A. 

(a) Let A , B be normal operators such that AB = BA. Show that AB is 
normal. 

(b) If A is normal, state and prove a spectral theorem for A. [Hint for the 
proof : Find a common eigenvector for A and A*.~\ 

10. Show that the complex matrix 



is normal, but is not hermitian and is not unitary. 


VIII, §6. UNITARY OPERATORS 

In the spectral theorem of the preceding section we have found an or¬ 
thogonal basis for the vector space, consisting of eigenvectors for an her¬ 
mitian operator. We shall now treat the analogous case for a unitary 
operator. 

The complex case is easier and clearer, so we start with the complex 
case. The real case will be treated afterwards. 


We let V be a finite dimensional vector space over C with a positive 
definite hermitian scalar product. 

We let 

U : V -► V 
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be a unitary operator. This means that U satisfies any one of the follow¬ 
ing equivalent conditions: 

U preserves norms, i.e. \\Uv\\ = ||tf|| for all veV. 

U preserves scalar products, i.e. (Uv, Uw} = (v, w> for v, weV. 

U maps unit vectors on unit vectors. 

Since we are over the complex numbers, we know that U has an ei¬ 
genvector v with an eigenvalue X ^ 0 (because U is invertible). The one¬ 
dimensional subspace generated by v is an invariant (we also say stable) 
subspace. 

Lemma 6.1. Let W be a U-invariant subspace of V. Then W 1 is also 

U-invariant. 

Proof. Let veW 1 so that <w, v} = 0 for all weW. Recall that 
U* = L/ -1 . Since U: W -> W maps W into itself and since U has kernel 
{O}, it follows that U~ x maps W into itself also. Now 

<w, Uv> = <l/*w, y> = <l/ _ 1 w, V s ) = 0, 


thus proving our lemma. 

Theorem 6.2. Let V be a non-zero finite dimensional vector space over 
the complex numbers, with a positive definite hermitian product. Let 
U: V -> V be a unitary operator. Then V has an orthogonal basis con¬ 
sisting of eigenvectors of U. 

Proof. Let v x be a non-zero eigenvector, and let V 1 be the 1-dimen¬ 
sional space generated by v v Just as in Lemma 6.1, we see that the or¬ 
thogonal complement V{ is L-invariant, and by induction, we can find 
an orthogonal basis {v 2 ,... ,v n } of V\ consisting of eigenvectors for U. 
Then {v l9 ... ,v n } is the desired basis of V. 

Next we deal with the real case. 

Theorem 6.3. Let V be a finite dimensional vector space over the reals, 
of dimension > 0, and with a positive definite scalar product. Let T be 
a real unitary operator on V. Then V can be expressed as a direct sum 

V=V 1 ®---®V r 

of T-invariant subspaces, which are mutually orthogonal (i.e. V t is or¬ 
thogonal to Vj if i 7 ^ j) and dim V t is 1 or 2, for each i. 
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Proof. After picking an orthonormal basis for V over R, we may as¬ 
sume that V = R” and that the positive definite scalar product is the or¬ 
dinary dot product. We can then represent T by a matrix, which we 
denote by M. Then M is a unitary matrix. 

Now we view M as operating on C". Since M is real and *M = M _1 , 
we also get 

'M = M" 1 


so M is also complex unitary. 

Let Z be a non-zero eigenvector of M in C" with eigenvalue A, so 


MZ = A Z. 


Since ||MZ|| = ||Z|| it follows that \A\ = 1. Hence there exists a real 
number 0 such that A = e l °. Thus in fact we have 


We write 


MZ = e i0 Z. 

Z = X + iY with X, YeR n . 


Case 1. A = e l ° is real, so e ld = 1 or — 1. Then 

MX = AX and MY = AY. 

Since Z ^ O it follows that at least one of X, Y is ^ O. Thus we have 
found a non-zero eigenvector v for T. Then we follow the usual proce¬ 
dure. We let V 1 = ( v ) be the subspace generated by v over R. Then 

V = V, ® v\. 

Lemma 6.1 applies to the real case as well, so T maps V\ into V\. We 
can then apply induction to conclude the proof. 

Case 2. A = e ld is not real. Then A ^ A, and A = e~ 19 . Since M is real, 
we note that 

MZ = IZ, 

so Z = X — iY is also an eigenvector with eigenvalue A. If we write 

e ld = cos 6 + i sin 9 

then 


MZ = MX + iMY = (cos 9 + i sin 0)(X + iT) 

= ((cos 0)X - (sin 0)7) + i((cos 0)7 + (sin 0)X), 
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whence taking real and imaginary parts, 

MX = (cos B)X - (sin 0)7, 

MY = (sin 6)X + (cos 0)7. 

The two vectors X , 7 are linearly independent over R, otherwise Z and 
Z would not have distinct eigenvalues for M. We let 

V x = subspace of V generated by X , 7 over R. 

Then the formulas for MX and MY above show that V x is invariant 
under T. Thus we have found a 2-dimensional T-invariant subspace. By 
Lemma 6.1 which applies to the real case, we conclude that Vf is also 
T-invariant, and 

v=v 1 ® v\. 

We can conclude the proof by induction. Actually, we have proved 
more, by showing what the matrix of T is with respect to a suitable ba¬ 
sis, as follows. 


Theorem 6.4. Let V be a finite dimensional vector space over the reals , 
of dimension >0 and with a positive definite scalar product. Let T be a 
unitary operator on V. Then there exists a basis of V such that the 
matrix of T with respect to this basis consists of blocks 



O 

m 2 

o 


o 

o 

M r 


such that each M t is a 
types : 

(i), 


lxl matrix or a 2x2 matrix , of the following 


/cos 0 — sin 0 

ysin0 cos 0 


We observe that on each component space V { in the decomposition 


V= Fi©--.©K r 

the linear map T is either the identity /, or the reflection —/, or a rota¬ 
tion. This is the geometric content of Theorem 6.3 and Theorem 6.4. 
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IX, §1. POLYNOMIALS 

Let K be a field. By a polynomial over K we shall mean a formal 
expression 

/(0 = a n tn + ■ • • + No¬ 


where t is a “variable”. We have to explain how to form the sum and 
product of such expressions. Let 


m = b m r + - + b 0 

be another polynomial with bj e K. If, say, n ^ m we can write bj = 0 if 
j > m, 

0(O = Or" + .-- + M m + - + &o, 

and then we can write the sum / + g as 

(/ + 0X0 — ( a n + b n )t n + ' * ' + ( a 0 + fro)- 

Thus / + g is again a polynomial. If ceK , then 

(c/)(0 = + ••• + ca 0 , 

and hence c/ is a polynomial. Thus polynomials form a vector space 
over K. 




232 


POLYNOMIALS AND MATRICES 


[IX, §1] 


We can also take the product of the two polynomials, fg, and 
(M(t) = ( a„bjt n+m + ■■■ + a 0 b 0 , 
so that fg is again a polynomial. In fact, if we write 

(fg)(t) = c n+m t n+m + --- + c 0 , 

then 


k 

Ck= Z a A-i = a 0 b k + a 1 b k _ 1 + --- + a k b 0 . 

i = 0 

All the preceding rules are probably familiar to you but we have recalled 
them to get in the right mood. 

When we write a polynomial / in the form 


fif) — a n tn + '" + a o 

with a { e K , then the numbers a 09 ... ,a n are called the coefficients of the 
polynomial. If n is the largest integer such that a n / 0, then we say that 
n is the degree of / and write n = deg /. We also say that a n is the lead¬ 
ing coefficient of /. We say that a 0 is the constant term of /. If / is the 
zero polynomial, then we shall use the convention that deg /= — oo. 
We agree to the convention that 

— oo H-oo = — oo, 

— oo + a = — oo, — coca 

for every integer a, and no other operation with — oo is defined. 

The reason for our convention is that it makes the following theorem 
true without exception. 


Theorem 1.1. Let /, g he polynomials with coefficients in K. Then 

deg (fg) = deg/+ deg g. 

Proof Let 

/(O = a n t n + • * * + a o an d g(t) — b m t m + • • • + b 0 

with a n ^ 0 and b m ^ 0. Then from the multiplication rule for fg , we see 
that 

f(t)g(t) = a n b m t n + m + terms of lower degree, 
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and a n b m ^ 0. Hence deg/# = n 4- m = deg/ + deg g. If / or g is 0, then 
our convention about — oo makes our assertion also come out. 

A polynomial of degree 1 is also called a linear polynomial. 

By a root a of / we shall mean a number such that /(a) = 0. We 
admit without proof the following statement: 

Theorem 1.2. Let f be a polynomial with complex coefficients , of degree 
^ 1. Then f has a root in C. 

We shall prove this theorem in an appendix, using some facts of 
analysis. 

Theorem 1.3. Let f be a polynomial with complex coefficients , leading 
coefficient 1, and deg / = n7> 1. Then there exist complex numbers 
a 1? ... ,a„ such that 

f(t) = (*-<*!>•••(*- a„). 


The numbers a l9 ...,a„ are uniquely determined up to a permutation. 
Every root a of f is equal to some a f , and conversely. 

Proof We shall give the proof of Theorem 1.3 (assuming Theorem 
1.2) completely in Chapter XI. Since in this chapter, and the next two 
chapters, we do not need to know anything about polynomials except 
the simple statements of this section, we feel it is better to postpone the 
proof to this later chapter. Furthermore, the further theory of poly¬ 
nomials developed in Chapter XI will also have further applications to 
the theory of linear maps and matrices. 

As a matter of terminology, let a x ,... ,a r be the distinct roots of the 
polynomial / in C. Then we can write 

/(0 = (t-a i r...(t-a r r, 

with integers m l9 ... ,m r > 0, uniquely determined. We say that m t is the 

multiplicity of a, in /. 


IX, §2. POLYNOMIALS OF MATRICES AND LINEAR MAPS 

The set of polynomials with coefficients in K will be denoted by the 
symbols K[t]. 

Let A be a square matrix with coefficients in K. Let feK[t] 9 and 
write 


/(0 — a nt n + * * * + a o 
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with a t e K. We define 


f(A) — a n A n + ••• + a 0 I. 


Example 1. Let f(t ) = 3 1 2 — 2t + 5. Let A = 


A - 


/I -1\ 2 (2 — 2\ (5 0 N 

•™ = 3 2 0 - 4 0W0 5. 


Theorem 2.1. Let /, geK\_t~\. Let A be a square 
cients in K. Then 

if + g)(A ) = f(A) + g(A), 
(fg)(A) = f(A)g(A). 

If ceK, then (cf)(A) = cf(A). 

Proof. Let /(t) and g(t) be written in the form 


and 


with a h bjeK. Then 


fit) = a n t n + ••• + a 0 

g(t) = b m r + --- + b 0 


(fg)(t) = c m+n r +n + ••• + c 0 , 


where 


By definition, 


On the other hand, 


and 


c k = Y, a i b k-t- 

i = 0 

(fg)(A) = c m + n A m+n + • • • + c 0 / 


f(A) = a„A n + ••• + a 0 I 
g(A) = b m A m + ■ ■ ■ + b 0 I. 


Hence 


f(A)g(A) = 


n m 


m 


E E a i A i b j A J = L E a i b jA i+J 
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matrix with coeffi- 


m+n 

E c k A\ 

k = 0 


Thus 



[IX, §2] 


POLYNOMIALS OF MATRICES AND LINEAR MAPS 


235 


For the sum, suppose n ^ m, and let bj = 0 if j > m. We have 

(/ + d)(A) = ( a n + b n )A n + • • • + (a 0 + b 0 )I 
= a n A n + b n A n + • • • + a 0 I + b 0 I 
= f(A) + g(A). 

If ceK , then 

(c/)(A) = + • • • + ca 0 I = cf(A). 

This proves our theorem. 

Example 2. Let f(t ) = (t — 1 )(t + 3) = t 2 + It — 3. Then 

f(A) = A 2 + 2A — 31 = (A — 1)(A + 3/). 

If we multiply this last product directly using the rules for multiplication 
of matrices, we obtain in fact 

A 2 — IA + 3 Al - 3 1 2 = A 2 + 2A- 31. 

Example 3. Let a l9 ... 9 ai n be numbers. Let 

f(t) = (t oti) • • • (t aj. 

Then 

Let V be a vector space over K , and let A : V-> V be an operator (i.e. 
linear map of V into itselO- Then we can form A 2 = Ao A = AA , and in 
general A n = iteration of A taken n times for any positive integer n. We 
define A 0 = I (where / now denotes the identity mapping). We have 


for all integers m, n ^ 0. If / is a polynomial in K[t] 9 then we can form 
f(A) the same way that we did for matrices, and the same rules hold as 
stated in Theorem 2.1. The proofs are the same. The essential thing that 
we used was the ordinary laws of addition and multiplication, and these 
hold also for linear maps. 

Theorem 2.2. Let A be an n x n matrix in a field K. Then there exists 
a non-zero polynomial f e K[t ] such that f (A) = O. 
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Proof. The vector space of n x n matrices over K is finite dimen¬ 
sional, of dimension n 2 . Hence the powers 

/, A, A 2 ,...,A N 

are linearly dependent for N > n 2 . This means that there exist numbers 
a 0 ,... ,a N e K such that not all a t = 0, and 

o n A n + • • • + a 0 I = O. 

We let f{t) = a N t N + ••• + a 0 to get what we want. 

As with Theorem 2.1, we note that Theorem 2.2 also holds for a 
linear map A of a finite dimensional vector space over K. The proof 
is again the same, and we shall use Theorem 2.2 indiscriminately for 
matrices or linear maps. 

We shall determine later in Chapter X, §2 a polynomial P(t) which 
can be constructed explicitly such that P(A) = O. 

If we divide the polynomial / of Theorem 2.2 by its leading coefficient, 
then we obtain a polynomial g with leading coefficient 1 such that 
g(A) = O. It is usually convenient to deal with polynomials whose lead¬ 
ing coefficient is 1, since it simplifies the notation. 


IX, §2. EXERCISES 


1. Compute f(A) when f(t) = t 3 — 2t 4- 1 and A = 

2. Let A be a symmetric matrix, and let / be a polynomial with real coefficients. 
Show that f(A) is also symmetric. 

3. Let A be a hermitian matrix, and let / be a polynomial with real coefficients. 
Show that f{A) is hermitian. 



4. Let A, B be n x n matrices in a field K , and assume that B is invertible. 
Show that 


C B~ l AB) n = B~ l A n B 


for all positive integers n. 

5. Let Let A, B be as in Exercises 4. Show that 


f{B- l AB) = B- l f{A)B. 



CHAPTER X 


Triangulation of Matrices 
and Linear Maps 


X, §1. EXISTENCE OF TRIANGULATION 

Let V be a finite dimensional vector space over the field K, and assume 
n = dim V ^ 1. Let A : V-+ V be a linear map. Let W be a subspace of 
V. We shall say that W is an invariant subspace of A, or is ^-invariant, if 
A maps W into itself. This means that if weW, then Aw is also con¬ 
tained in W. We also express this property by writing AW a W. By a 
fan of A (in V) we shall mean a sequence of subspaces {V l9 ... 9 V n } such 
that V t is contained in V i+1 for each i = 1— 1, such that dim V { = i , 
and finally such that each V t is ^-invariant. We see that the dimensions 
of the subspaces V l9 ... 9 V n increases by 1 from one subspace to the next. 
Furthermore, V = V n . 

We shall give an interpretation of fans by matrices. Let {V l9 ... 9 V n } be 
a fan for A. By a fan basis we shall mean a basis {v l9 ... 9 v n } of V such 
that {v l9 ... 9 v i } is a basis for V t . One sees immediately that a fan basis 
exists. For instance, let v x be a basis for V v We extend v l to a basis 
{v l9 v 2 } of V 2 (possible by an old theorem), then to a basis {t?i, i> 2 > ^ 3 } of 
V 3 , and so on inductively to a basis {v l9 ... 9 v n j of V n . 

Theorem 1.1. Let {v l9 ... 9 v n } be a fan basis for A. Then the matrix 

associated with A relative to this basis is an upper triangular matrix. 
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Proof. Since AV t is contained in V t for each i= 1 , ...,n, there exist 
numbers a tj such that 

Av x = a xl v l9 

Av 2 = a\ 2 V l + a 22 V 2 

AVi = a li v l + a 2i v 2 + * * * + a^ 

Av n = a ln v 1 + a 2n v 2 + • • • + a nn v n . 

This means that the matrix associated with A with respect to our basis is 
the triangular matrix 



as was to be shown. 

Remark. Let A be an upper triangular matrix as above. We view A 
as a linear map of K n into itself. Then the column unit vectors e l ,...,e n 
form a fan basis for A. If we let V t be the space generated by e 1 , 
then {V l9 ...,V n } is the corresponding fan. Thus the converse of Theorem 
1.1 is also obviously true. 

We recall that it is not always the case that one can find an eigenvec¬ 
tor (or eigenvalue) for a linear map if the given field K is not the com¬ 
plex numbers. Similarly, it is not always true that we can find a fan for 
a linear map when K is the real numbers. If A : V-> V is a linear map, 
and if there exists a basis for V for which the associated matrix of A is 
triangular, then we say that A is triangulable. Similarly, if A is an n x n 
matrix, over the field K , we say that A is triangulable over K if it is 
triangulable as a linear map of K n into itself. This is equivalent to say¬ 
ing that there exists a non-singular matrix B in K such that B~ 1 AB is 
an upper triangular matrix. 

Using the existence of eigenvectors over the complex numbers, we 
shall prove that any matrix or linear map can be triangulated over the 
complex numbers. 

Theorem 1.2. Let V be a finite dimensional vector space over the com¬ 
plex numbers , and assume that dim V^l. Let A: V-> V be a linear 
map. Then there exists a fan of A in V. 
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Proof. We shall prove the theorem by induction. If dim V = 1 then 
there is nothing more to prove. Assume that the theorem is true when 
dim V = n — 1, n > 1. By Theorem 2.3 of Chapter IX there exists a non¬ 
zero eigenvector v l for A. We let V x be the subspace of dimension 1 
generated by v v We can write V as a direct sum V = V t © W for some 
subspace W (by Theorem 4.2 of Chapter I asserting essentially that we 
can extend linearly independent vectors to a basis). The trouble now is 
that A does not map W into itself. Let P x be the projection of K on V l9 
and let P 2 be the projection of V on W. Then P 2 A is a linear map of V 
into V 9 which maps W into W (because P 2 maps any element of V into 
W). Thus we view P 2 A as a linear map of W into itself. By induction, 
there exists a fan of P 2 A in W, say {W l9 ... ,IT n _ 1 }. We let 

Vi=V 1 + W i _ 1 

for i = 2,... ,n. Then V t is contained in V i+l for each i = 1,... 9 n and one 
verifies immediately that dim V t = i. 

(If {ui,... 9 u n -i} ^ a basis of W such that {u l9 ... 9 Uj} is a basis of W j9 
then {v l9 u l9 ... 9 u i - l } is a basis of V { for i = 2,...,n.) 

To prove that {V l9 ... 9 V n } is a fan for A in V 9 it will suffice to prove 
that AV X is contained in V { . To do this, we note that 

A = IA = (P x 4- P 2 )A = P\A 4- P 2 A. 

Let v e We can write v = cv x 4- w f _ l9 with ceC and w £ _ x g x . Then 
P t Av = is contained in V l9 and hence in V- x . Furthermore, 

P 2 Av = P 2 A(cv i) + P 2 Aw i _ 1 . 

Since P 2 y4(ci; 1 ) = cP 2 Av l9 and since is an eigenvector of A , say 
= 2^!, we find P 2 A(cv 1 ) = P 2 (ch l v 1 ) — O. By induction hypothesis, 
P 2 A maps Wi into itself, and hence P 2 Aw i - l lies in W i ^ l . Hence P 2 Av lies 
in V i9 thereby proving our theorem. 

Corollary 1.3. Let V be a finite dimensional vector space over the com¬ 
plex numbers , and assume that dim V ^ 1. Let A : VV be a linear 
map. Then there exists a basis of V such that the matrix of A with 
respect to this basis is a triangular matrix. 

Proof. We had already given the arguments preceding Theorem 1.1. 

Corollary 1.4. Let M be a matrix of complex numbers. There exists a 
non-singular matrix B such that B 'MB is a triangular matrix. 

Proof. This is the standard interpretation of the change of matrices 
when we change bases, applied to the case covered by Corollary 1.3. 
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X, §1. EXERCISES 


1. Let A be an upper triangular matrix: 


a il a i2 

0 a 2 2 
0 0 

Viewing A as a linear map, what are the eigenvalues of A 2 , A 3 , in general of 
A r where r is an integer ^ 1? 

2. Let A be a square matrix. We say that A is nilpotent if there exists an integer 
r ^ 1 such that A r = O. Show that if A is nilpotent, then all eigenvalues of A 
are equal to 0. 

3. Let F be a finite dimensional space over the complex numbers, and let 
A: V-> V be a linear map. Assume that all eigenvalues of A are equal to 0. 
Show that A is nilpotent. 

(In the two preceding exercises, try the 2x2 case explicitly first.) 

4. Using fans, give a proof that the inverse of an invertible triangular matrix is 
also triangular. In fact, if V is a finite dimensional vector space, if A: V-+ V is 
a linear map which is invertible, and if {V l9 ...,V n } is a fan for A , show that it 
is also a fan for A~ l . 




5. Let A be a square matrix of complex numbers such that A r = I for some posi¬ 
tive integer r. If a is an eigenvalue of A, show that a r = 1. 

6. Find a fan basis for the linear maps of C 2 represented by the matrices 


(a) 





7. Prove that an operator A: VV on a finite dimensional vector space over C 
can be written as a sum A = D + N, where D is diagonalizable and N is nil- 
potent. 


We shall now give an application of triangulation to a special type of 
matrix. 

Let A = (a^) be an n x n complex matrix. If the sum of the elements 
of each column is 1 then A is called a Markov matrix. In symbols, for 
each j we have 

= !• 
i 

We leave the following properties as exercises. 

Property 1. Prove that if A, B are Markov matrices, then so is AB. In 
particular, if A is a Markov matrix, then A k is a Markov matrix for every 
positive integer k. 
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Property 2. Prove that if A, B are Markov matrices such that k; I ^ 1 
and \bij\ ^ 1 for all i, j and if AB = C = (c i7 ), then |c 0 | ^ 1 for all i 9 j. 

Theorem 1.5. Let A be a Markov matrix such that \a tj \ ^ 1 for all i , j. 
Then every eigenvalue of A has absolute value ^ 1. 

Proof By Corollary l A there exists a matrix B such that BAB 1 is 
triangular. Let ..., X n be the diagonal elements. Then 

BA k B~ 1 = {BAB~ l ) k 

and so 

\ 

X\ * 

0 X k ] 

But ,4* is a Markov matrix for each k, and each component of A k has 
absolute value ^ 1 by Property 2. Then the components of BA k B~ l have 
bounded absolute values. If for some i we have \X t \ > 1, then |A*| —► oo as 
k -> oo, which contradicts the preceding assertion and concludes the proof. 


BA k B~ l = 


X, §2. THEOREM OF HAMILTON-CAYLEY 

Let K be a finite dimensional vector space over a field K , and let 
be a linear map. Assume that V has a basis consisting of 
eigenvectors of A, say {v l9 ...,v n j. Let {X u ... 9 X n } be the correspond¬ 
ing eigenvalues. Then the characteristic polynomial of A is 


P{t) = {t-X,)-{t-X n ) 9 

and 

P(A) = (A-X 1 I)--(A~X n I). 

If we now apply P(A) to any vector v i9 then the factor A — Xf will kill 
v h in other words, P(A)v t = O. Consequently, P(A) = O. 

In general, we cannot find a basis as above. However, by using fans, 
we can construct a generalization of the argument just used in the dia¬ 
gonal case. 

Theorem 2.1. Let V be a finite dimensional vector space over the com¬ 
plex numbers , of dimension ^ 1, and let A : VV be a linear map. Let 
P be its characteristic polynomial. Then P(A) = O. 
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Proof. By Theorem 1.2, we can find a fan for A, say {V l9 ... 9 V n }. 
Let 



be the matrix associated with A with respect to a fan basis, {v i9 ... 9 v n }. 
Then 

Av t — a u Vi + an element of x 

or in other words, since (A — a H I)Vi = Av { — a u v i9 we find that 

(A — dufyVi lies in V i _ l . 

Furthermore, the characteristic polynomial of A is given by 


P(t) = (t - a x !>•••(« - O, 

so that 

P(A) = (A- ail I)-(A-a nn I). 


We shall prove by induction that 

(A - a xX I)-- (A - a u I)v = 0 

for all v in V i9 i = 1 , ...,n. When i = n 9 this will yield our theorem. 

Let i = 1. Then (A — a ll I)v l = At?! — a ll v 1 = 0 and we are done. 

Let i > 1, and assume our assertion proved for i — 1. Any element of 
F f can be written as a sum v' + cv t with i/ in F f _ l9 and some scalar c. 
We note that (A — a u I)v' lies in V i . l because AVi_ x is contained in 
V { _ l9 and so is a u v'. By induction, 

(A - fl n /)-(i4- - a it iy = O. 

On the other hand, (A — a H I)cvi lies in V t - U and hence by induction, 

(A - a xl I) ---(A - - a^fycVi = 0. 


Hence for v in F f , we have 




thereby proving our theorem. 

Corollary 2.2. Let A be an n x n matrix of complex numbers , and /et P 
be its characteristic polynomial. Then P(A) = 0. 
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Proof. We view A as a linear map of C" into itself, and apply the 
theorem. 

Corollary 2.3. Let V be a finite dimensional vector space over the field 
K , and let A: V -> V be a linear map. Let P be the characteristic poly¬ 
nomial of A. Then P(A) = O. 

Proof. Take a basis of V, and let M be the matrix representing A 
with respect to this basis. Then P M = P A , and it suffices to prove that 
P m (M) = O. But we can apply Theorem 2.1 to conclude the proof. 

Remark. One can base a proof of Theorem 2.1 on a continuity 
argument. Given a complex matrix A, one can, by various methods 
into which we don’t go here, prove that there exist matrices Z of the 
same size as A, lying arbitrarily close to A (i.e. each component of Z 
is close to the corresponding component of A) such that P z has all its 
roots of multiplicity 1. In fact, the complex polynomials having roots of 
multiplicity > 1 are thinly distributed among all polynomials. Now, if Z 
is as above, then the linear map it represents is diagonalizable (because 
Z has distinct eigenvalues), and hence P?fZ) = O trivially, as noted at 
the beginning of this section. However, P z (Z) approaches P a (A) as Z 
approaches A. Hence P a (A) = O. 


X, §3. DIAGONALIZATION OF UNITARY MAPS 

Using the methods of this chapter, we shall give a new proof for the fol¬ 
lowing theorem, already proved in Chapter VIII. 

Theorem 3.1. Let V be a finite dimensional vector space over the com¬ 
plex numbers , and let dim 1. Assume given a positive definite her- 
mitian product on V. Let A : V -> V be a unitary map. Then there exists 
an orthogonal basis of V consisting of eigenvectors of A. 

Proof. First observe that if w is an eigenvector for A, with eigenvalue 
A, then Aw = Aw, and A # 0 because A preserves length. 

By Theorem 1.2, we can find a fan for A , say {V l9 ...,V n }. Let 
{v l9 ... ,v n } be a fan basis. We can use the Gram-Schmidt orthogonaliza- 
tion process to orthogonalize it. We recall the process: 


= » 1 , 


Vo = v- 


<^2^1) 
7 -\ 
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From this construction, we see that {v \ 9 ... ,i/„} is an orthogonal basis 
which is again a fan basis, because {v\ 9 ... ,i/J is a basis of the same 
space Vi as {v l9 ... ,pJ. Dividing each v\ by its norm we obtain a fan 
basis which is orthonormal. We contend that each w f is an 

eigenvector for A. We proceed by induction. Since Aw x is contained in 
V l9 there exist a scalar X x such that Aw 1 = X 1 w 1 , so that is an eigen¬ 
vector, and X x / 0. Assume that we have already proved that 
wx,...,\Vf_i are eigenvectors with non-zero eigenvalues. There exist 
scalars c l9 ... ,c f such that 


Aw ( = H -h CfWj-. 

Since >1 preserves perpendicularity, Aw t is perpendicular to Aw k for every 
k < i. But Aw k = X k w k . Hence Aw { is perpendicular to w k itself, and 
hence c k = 0. Hence Aw t = c f W;, and ^ 0 because A preserves length. 
We can thus go from 1 to n to prove our theorem. 

Corollary 3.2. Let A be a complex unitary matrix. Then there exists a 
unitary matrix U such that U~ 1 AU is a diagonal matrix. 

Proof. Let {e l ,...,e n } = & be the standard orthonormal basis of C”, 
and let {vv x ,be an orthonormal basis which diagonalizes A , 
viewed as a linear map of C” into itself. Let 

U = Mi\. id). 


Then U is unitary (cf. Exercise 5 of Chapter VII, §3), and if M' is the 
matrix of A relative to the basis then 

M' = U~ l AU. 


This proves the Corollary. 


X, §3. EXERCISES 

1. Let A be a complex unitary matrix. Show that each eigenvalue of A can be 
written e id with some real 9. 

2. Let A be a complex unitary matrix. Show that there exists a diagonal matrix 
B and a complex unitary matrix U such A — U~ l BU. 



CHAPTER XI 


Polynomials and Primary 
Decomposition 


XI, §1. THE EUCLIDEAN ALGORITHM 

We have already defined polynomials, and their degree, in Chapter IX. 
In this chapter, we deal with the other standard properties of polyno¬ 
mials. The basic one is the Euclidean algorithm, or long division, taught 
(presumably) in all elementary schools. 


Theorem 1.1. Let /, g be polynomials over the field K , i.e. polynomials 
in K[t ], and assume deg g ^ 0. Then there exist polynomials q , r in 
X[t] such that 


/(0 = q(t)g(t) + r(t\ 

and deg r < deg g. The polynomials q , r are uniquely determined by 
these conditions. 


Proof. Let m = deg g ^ 0. Write 


f(fi) — a n t n + ••• + a 0 , 

g(t) = b m t m + ••• + fc 0 , 

with b m ^0. If n < m, let q = 0, r = /. If n ^ m, let 


/i(0 =/(0 — a n b~ l t n ~ m g(t). 
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(This is the first step in the process of long division.) Then 
deg/i < deg/. Continuing in this way, or more formally by induction 
on n, we can find polynomials q l9 r such that 


A = <h 9 + r. 


with deg r < deg g. Then 


fit) = a n b- l t n - m g(t) + Mt ) 

= a n b~ l t n ~ m g(t) + qfOgit) + r(t ) 

= (a n b m + qi )g(t) + r(t), 

and we have consequently expressed our polynomial in the desired form. 
To prove the uniqueness, suppose that 

f=q x g + r x = q 2 g + r 2 , 

with deg r x < deg g and deg r 2 < deg g. Then 

(fli ~ )g = r 2 - r x . 

The degree of the left-hand side is either ^ deg g , or the left-hand side is 
equal to 0. The degree of the right-hand side is either < deg g , or the 
right-hand side is equal to 0. Hence the only possibility is that they are 
both 0, whence 


q 1 =q 2 and r x = r 2 , 


as was to be shown. 

Corollary 1.2. Let f be a non-zero polynomial in K[t~\. Let oceK be 
such that /(a) = 0. Then there exists a polynomial q(t) in K[t] such 
that 

fit ) = (t - a)q(t). 


Proof. We can write 


fit ) = qi.t)it - a) + r(t), 

where deg r < deg (t — a). But deg (t — a) = 1. Hence r is constant. Since 

0 =/(a) = q(cc)(oc - a) + r(a) = r(a), 


it follows that r = 0, as desired. 
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Corollary 1.3. Let K be a field such that every non-constant polynomial 
in K[t ] has a root in K. Let f be such a polynomial. Then there exist 
elements oc u ... ,oc n e K and ceK such that 


f{t) = c(t -<*!)•• •(* - ot n ). 


Proof. In Corollary 1.2, observe that deg q = deg / — 1. Let a = oc 1 in 
Corollary 1.2. By assumption, if q is not constant, we can find a root a 2 
of q , and thus write 


fit) = q 2 it)it - ct x )it - ct 2 ). 

Proceeding inductively, we keep on going until q n is constant. 

Assuming as we do that the complex numbers satisfy the hypothesis of 
Corollary 1.3, we see that we have proved the existence of a factorization 
of a polynomial over the complex numbers into factors of degree 1. The 
uniqueness will be proved in the next section. 

Corollary 1.4. Let f be a polynomial of degree n in K[t~\. There are at 
most n roots of f in K. 

Proof Otherwise, if m > n, and a l9 ...,a m are distinct roots of / in K , 
then 

fit) = (t - - 00(0 


for some polynomial g, whence deg m, contradiction. 


XI, §1. EXERCISES 

1. In each of the following cases, write / = qg 4- r with deg r < deg g. 

(a) fit) = £ 2 - It + 1, git) = f - 1 

(b) fit) = t 3 + t - 1, git) = t 2 + 1 

(c) fit) =t 3 + t, git) = t 

(d) fit) = 0-1, git) = t - 1 

2. If fit) has integer coefficients, and if git) has integer coefficients and leading 
coefficient 1, show that when we express f = qg + r with deg r < deg g, the 
polynomials q and r also have integer coefficients. 

3. Using the intermediate value theorem of calculus, show that every polynomial 
of odd degree over the real numbers has a root in the real numbers. 

4. Let fit) = t n + • • • + a 0 be a polynomial with complex coefficients, of de¬ 
gree n, and let a be a root. Show that |a| ^ n-max, \a t \. [ Hint: Write 

— a" = a n _ l cc n ~ 1 + • • • + a 0 . If |a|>n-maXf \a t \, divide by a" and take the 
absolute value, together with a simple estimate to get a contradiction.] 
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XI, §2. GREATEST COMMON DIVISOR 

We shall define a notion which bears to the set of polynomials K[t] the 
same relation as a subspace bears to a vector space. 

By an ideal of K[t] 9 or a polynomial ideal, or more briefly an ideal we 
shall mean a subset J of K[t] satisfying the following conditions. 

The zero polynomial is in J. If /, g are in J, then f + g is in J. If f is 

in J, and g is an arbitrary polynomial , then gf is in J. 

From this last condition, we note that if ceK, and / is in J, then cf is 
also in J. Thus an ideal may be viewed as a vector space over K. But it 
is more than that, in view of the fact that it can stand multiplication by 
arbitrary elements of K[t] 9 not only constants. 

Example 1. Let /i,... ,/ n be polynomials in K[t]. Let J be the set of 
all polynomials which can be written in the form 


G — G\f\ + ••• + Gnfn 

with some Then J is an ideal. Indeed, if 


^ — ^l/l + ’ * * + Kfn 


with hjSK[t] 9 then 


G + h — (g i + h 1 )f 1 H-4- (g n 4- h n )f n 


also lies in J. Also, 0 = 0f x + • • • + 0f n lies in J. If / is an arbitrary 
polynomial in K[t ] 9 then 

fG = (fGl)fl + ‘ ‘ ‘ + ( fGn)fn 
is also in J. Thus all our conditions are satisfied. 

The ideal J in Example 1 is said to be generated by f u and we 

say that /i,are a set of generators. 

We note that each f t lies in the ideal J of Example 1. For instance, 


/i — 1 */i + Q /2 + *•' + Q/n ■ 


Example 2. The single element 0 is an ideal. Also, K[t] itself is an 
ideal. We note that 1 is a generator for K[t] 9 which is called the unit 
ideal. 
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Example 3. Consider the ideal generated by the two polynomials t — 1 
and t — 2. We contend that it is the unit ideal. Namely, 


(f -l)_ (f _2)=l 


is in it. Thus it may happen that we are given several generators for an 
ideal, and still we may find a single generator for it. We shall describe 
more precisely the situation in the subsequent theorems. 

Theorem 2.1. Let J be an ideal of K[t], Then there exists a polynomial 
g which is a generator of J. 


Proof Suppose that J is not the zero ideal. Let g be a polynomial in 
J which is not 0, and is of smallest degree. We assert that g is a genera¬ 
tor for J. Let / be any element of J . By the Euclidean algorithm, we 
can find polynomials q, r such that 

f=qg + r 

with deg r < deg g . Then r = / — qg, and by the definition of an ideal, it 
follows that r also lies in J. Since deg r < deg g , we must have r = 0. 
Hence / = qg 9 and g is a generator for J, as desired. 

Remark. Let ^ be a non-zero generator for an ideal J, and let 
g 2 also be a generator. Then there exists a polynomial q such that 
g l = qg 2 - Since 

deg g t = deg q + deg g 2 , 

it follows that deg g 2 ^ deg^. By symmetry, we must have 


deg g 2 = deg g v 


Hence q is constant. We can write 

0i = c Gi 


with some constant c. Write 

g 2 (t) = a n t n + ••• + a 0 

with a n ^ 0. Take b = a~ l . Then bg 2 is also a generator of J, and its 
leading coefficient is equal to 1. Thus we can always find a generator for 
an ideal (#0) whose leading coefficient is 1. It is furthermore clear that 
this generator is uniquely determined. 
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Let /, g be non-zero polynomials. We shall say that g divides /, and 
write g\f if there exists a polynomial q such that / = gq. Let f u f 2 be 
polynomials ^ 0. By a greatest common divisor of f u f 2 we shall mean 
a polynomial g such that g divides f x and / 2 , and furthermore, if h 
divides f 1 and / 2 , then h divides g. 

Theorem 2.2. Let f u f 2 be non-zero polynomials in K[t]. Let g be a 
generator for the ideal generated by f u f 2 . Then g is a greatest com¬ 
mon divisor of f x and f 2 . 

Proof Since f 1 lies in the ideal generated by f u / 2 , there exists a 
polynomial q x such that 

fi = <h9, 

whence g divides f v Similarly, g divides f 2 . Let h be a polynomial 
dividing both f x and / 2 . Write 

/i=M and f 2 = h 2 h 

with some polynomials h l and h 2 . Since g is in the ideal generated by 
/ t , / 2 , there are polynomials g u g 2 such that g = g l f 1 + g 2 f 2i whence 

g = g^ih + g 2 h 2 h = (g^ + g 2 h 2 )h. 

Consequently h divides g , and our theorem is proved. 

Remark 1. The greatest common divisor is determined up to a non¬ 
zero constant multiple. If we select a greatest common divisor with lead¬ 
ing coefficient 1, then it is uniquely determined. 

Remark 2. Exactly the same proof applies when we have more than 
two polynomials. For instance, if /i,are non-zero polynomials, 
and if g is a generator for the ideal generated by /i,...,/ n then g is a 
greatest common divisor of /i, 

Polynomials whose greatest common divisor is 1 are said to 

be relatively prime. 


XI, §2. EXERCISES 

1. Show that t n — 1 is divisible by t — 1. 

2. Show that t 4 + 4 can be factored as a product of polynomials of degree 2 
with integer coefficients. 
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3. If n is odd, find the quotient of t n + 1 by t + 1. 

4. Let A be an n x n matrix over a field K , and let J be the set of all polyno¬ 
mials f{t) in K[r] such that f(A) = O. Show that J is an ideal. 


XI, §3. UNIQUE FACTORIZATION 

A polynomial p in X[t] will be said to be irreducible (over K ) if it is of 
degree ^ 1, and if, given a factorization p = fg with /, geK[t], then 
deg / or deg g = 0 (i.e. one of /, g is constant). Thus, up to a non-zero 
constant factor, the only divisors of p are p itself, and 1. 

Example 1. The only irreducible polynomials over the complex 
numbers are the polynomials of degree 1, i.e. non-zero constant multiples 
of polynomials of type t — a, with a e C. 

Example 2. The polynomial t 2 + 1 is irreducible over R. 

Theorem 3.1. Every polynomial in of degree ^ 1 can be expressed 
as a product pi,...,p m of irreducible polynomials. In such a product , the 
polynomials Pi,...,p m are uniquely determined , up to a rearrangement , 
and up to non-zero constant factors. 

Proof. We first prove the existence of the factorization into a product 
of irreducible polynomials. Let / be in K[t] 9 of degree ^1. If / is irre¬ 
ducible, we are done. Otherwise, we can write 

f=gK 

where deg g < deg / and deg h < deg /. If g , h are irreducible, we are 
done. Otherwise, we further factor g and h into polynomials of lower de¬ 
gree. We cannot continue this process indefinitely, and hence there exists 
a factorization for /. (We can obviously phrase the proof as an induc¬ 
tion.) 

We must now prove uniqueness. We need a lemma. 

Lemma 3.2. Let p be irreducible in Let /, geK{t] be non-zero 

polynomials , and assume p divides fg. Then p divides f or p divides g. 

Proof. Assume that p does not divide /. Then the greatest common 
divisor of p and / is 1, and there exist polynomials h l9 h 2 in K[t ] such 
that 


1 = h t p + h 2 f. 
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(We use Theorem 2.2.) Multiplying by g yields 

g = ghrf + h 2 fg. 


But fg = ph 3 for some h 39 whence 


g = ( gh 1 + h 2 h 3 )p , 


and p divides g 9 as was to be shown. 

The lemma will be applied when p divides a product of irreducible 
polynomials q^-q^ In that case, p divides q 1 or p divides q 2 ---q s . 
Hence there exists a constant c such that p = cq l9 or p divides q 2 "-q s . 
In the latter case, we can proceed inductively, and we conclude that 
in any case, there exists some i such that p and q t differ by a constant 
factor. 

Suppose now that we have two products of irreducible polynomials 

Pf-Pr = <ll---<Is- 

After renumbering the q i9 we may assume that p x = c 1 q 1 for some 
constant c v Cancelling q l9 we obtain 

C lP2 ‘ " Pr = tfo" ’ Qs’ 

Repeating our argument inductively, we conclude that there exist con¬ 
stants c ( such that p t = for all i, after making a possible permutation 
of q u ... 9 q s . This proves the desired uniqueness. 

Corollary 3.3. Let f be a polynomial in X[t] of degree ^ 1. Then f 
has a factorization f = cp 1 - p s9 where Pi,...,p s are irreducible polyno¬ 
mials with leading coefficient 1, uniquely determined up to a permutation. 

Corollary 3.4. Let f be a polynomial in C[t] 9 of degree ^ 1. Then f 
has a factorization 

fit) = c(t - a!>•••(«- a„), 


with a f eC and ceC. The factors t — a f are uniquely determined up to 
a permutation. 

We shall deal mostly with polynomials having leading coefficient 1. 
Let / be such a polynomial of degree ^ 1. Let p i9 ... 9 p r be the distinct 
irreducible polynomials (with leading coefficient 1) occurring in its factor¬ 
ization. Then we can express / as a product 


f = Pl ■■■ Pr, 
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where i l9 ...,i r are positive integers, uniquely determined by p l9 ...,p r . 
This factorization will be called a normalized factorization for /. In par¬ 
ticular, over the complex numbers, we can write 

fit) = ft - a-J'-it - a r )' v . 

A polynomial with leading coefficient 1 is sometimes called monic. 

If p is irreducible, and / = p m g 9 where p does not divide g 9 and m is an 
integer ^ 0, then we say that m is the multiplicity of p in /. (We define 
p° to be 1.) We denote this multiplicity by or d p f 9 and also call it the 
order of / at p. 

If a is a root of /, and 


fit) = it- <x) m g{i), 

with g(oc) ^ 0 , then t — cl does not divide g(t) 9 and m is the multiplicity of 
t — cl in /. We also say that m is the multiplicity of cl in /. 

There is an easy test for m > 1 in terms of the derivative. 

Let f(t) = a n t n + --- + a 0 be a polynomial. Define its (formal) deriva¬ 
tive to be 


Dfit) = fit) = na n t" + in — l)a n _ l t n 2 + •• • + a x . 

Then we have the following statements, whose proofs are left as exercises. 

(a) If /, g are polynomials , then 

(/ + gy =r + g f - 

Also 

(M = fg + fg’- 

If c is constant , then (cf) f = cf . 

(b) Let cl be a root of f and assume deg 1. Show that the 
multiplicity of cl in f is > 1 if and only if f'(oc) = 0. Hence if 
/'(a) 7 ^ 0, the multiplicity of cl is 1. 


XI, §3. EXERCISES 

1. Let / be a polynomial of degree 2 over a field K. Show that either / is 
irreducible over K , or / has a factorization into linear factors over K. 

2. Let / be a polynomial of degree 3 over a field K. If / is not irreducible over 
K 9 show that / has a root in K. 
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3. Let f(t ) be an irreducible polynomial with leading coefficient 1 over the real 
numbers. Assume deg / = 2. Show that /(£) can be written in the form 

fit) = (t - a) 2 + b 2 

with some a, be R and MO. Conversely, prove that any such polynomial is 
irreducible over R. 

4. Let / be a polynomial with complex coefficients, say 


fit) = a„f" + ••• + «„. 


Define its complex conjugate, 

7(0 = + ' *' + 

by taking the complex conjugate of each coefficient. Show that if /, g are in 
C[t], then 

(/+ 9) =f + §, US) = fg » 


and if j?eC, then (/?/) = /?/. 

5. Let /(£) be a polynomial with real coefficients. Let a be a root of /, which is 
complex but not real. Show that a is also a root of /. 

6. Terminology being as in Exercise 5, show that the multiplicity of a in / is the 
same as that of a. 

7. Let A be an n x n matrix in a field K. Let J be the set of polynomials / in 
K[t] such that f(A) = O. Show that J is an ideal. The monic generator of J 
is called the minimal polynomial of A over K. A similar definition is made if 
A is a linear map of a finite dimensional vector space V into itself. 

8. Let V be a finite dimensional space over K. Let A: V -> V be a linear map. 
Let / be its minimal polynomial. If A can be diagonalized (i.e. if there exists 
a basis of V consisting of eigenvectors of A), show that the minimal polyno¬ 
mial is equal to the product 


(t - aJ-'-it - a r ), 


where a l5 ...,a r are the distinct eigenvalues of A. 

9. Show that the following polynomials have no multiple roots in C. 

(a) t 4 + t (b) r 5 - 5t + 1 

(c) any polynomial t 2 + bt + c if b, c are numbers such that b 2 — 4c is not 0. 

10. Show that the polynomial t n — 1 has no multiple roots in C. Can you deter¬ 
mine all the roots and give its factorization into factors of degree 1? 
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11. Let /, g be polynomials in K[t], and assume that they are relatively prime. 
Show that one can find polynomials g x such that the determinant 


is equal to 1. 


/ g 

fi Qi 


12. Let f u f 2 , / 3 be polynomials in K[t ] and assume that they generate the unit 
ideal. Show that one can find polynomials / y in K[r] such that the deter¬ 
minant 


a 

f21 

/31 


f2 

h 

f22 

f 23 

f$2 

f33 


is equal to 1. 

13. Let a be a complex number, and let J be the set of all polynomials /(f) in 
K[f] such that /(a) = 0. Show that J is an ideal. Assume that J is not the 
zero ideal. Show that the monic generator of J is irreducible. 

14. Let /, g be two polynomials, written in the form 

f=P\'--Pr 

and 

d = P J l -P l r r > 

where i v , j v are integers ^ 0, and p l ,...,p r are distinct irreducible polyno¬ 
mials. 

(a) Show that the greatest common divisor of / and g can be expressed as a 
product p\ x ’”P k r r where k l ,...,k r are integers ^ 0. Express k v in terms of 
K and j v . 

(b) Define the least common multiple of polynomials, and express the least 
common multiple of / and g as a product p \ 1 • • • p k r r with integers k v ^ 0. 
Express k v in terms of i v and j v . 

15. Give the greatest common divisor and least common multiple of the follow¬ 
ing pairs of polynomials: 

(a) (t - 2)\t - 3 )\t - i) and (f - l)(t - 2)(f - 3) 3 

(b) ( t 2 -I- l)(f 2 - 1) and (f + i) 3 (t 3 - 1) 


XI, §4. APPLICATION TO THE DECOMPOSITION 
OF A VECTOR SPACE 

Let V be a vector space over the field K , and let A : V -► V be an opera¬ 
tor of V. Let W be a subspace of V. We shall say that W is an invariant 
subspace under A if Aw lies in W for each w in W 9 i.e. if AW is contained 
in W. 
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Example 1. Let v 1 be a non-zero eigenvector of A 9 and let V 1 be the 
1-dimensional space generated by v 1 . Then V 1 is an invariant subspace 
under A. 


Example 2. Let X be an eigenvalue of A , and let V k be the subspace 
of V consisting of all veV such that Av = Xv. Then V x is an invariant 
subspace under A , called the eigenspace of X. 

Example 3. Let f{t)eK[t] be a polynomial, and let W be the kernel 
of f(A). Then W is an invariant subspace under A. 


Proof. Suppose that /( A)w = 0. Since tf ( t ) = / (t)t, we get 

Af(A) = f(A)A , 

whence 

f(A)(Aw) = f(A)Aw = Af(A)w = O. 

Thus Aw is also in the kernel of f(A), thereby proving our assertion. 
Remark in general that for any two polynomials /, g we have 

f(A)g(A ) = g(A)f(A) 

because ) = g(t)f(t). We use this frequently in the sequel. 

We shall now describe how the factorization of a polynomial into two 
factors whose greatest common divisor is 1, gives rise to a decomposition 
of the vector space V into a direct sum of invariant subspaces. 

Theorem 4.1. Let f{t)sK\_t ] be a polynomial , and suppose that 
f = /i/ 2 , where f u f 2 are polynomials of degree ^ 1, and greatest 
common divisor equal to 1. Let A: V—> V be an operator. Assume that 
f(A) = 0. Let 

W 1 = kernel of fi(A) and W 2 = kernel of f 2 (A). 

Then V is the direct sum of W 1 and W 2 . 

Proof. By assumption, there exist polynomials g u g 2 such that 

0 i(O/i(O + 02(0/2(0 = i- 

Hence 


gM)fM) + g 2 (A)f 2 (A) = /. 
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Let v e V. Then 

v = gM)UA)v + g 2 (AY 2 (A)v. 

The first term in this sum belongs to W 2 , because 

f 2 (A) gi (A)UA)v = = gi(A)f(A)v = 0. 

Similarly, the second term in this sum belongs to W v Thus V is the sum 
of W x and W 2 . 

To show that this sum is direct, we must prove that an expression 


v = Wi + w 2 

with w 1 eW 1 and w 2 eW 2 , is uniquely determined by v. Applying 
g 1 (A)f 1 (A) to this sum, we find 

gM)fM)v = gM)f 1 (A)w 2 . 

because / 1 (A)w 1 — 0. Applying the expression (*) to w 2 itself, we find 

w 2 = gM)f 1 (A)w 2 
because f 2 (A)w 2 — 0. Consequently 

w 2 = gi(A)MA)v> 

and hence w 2 is uniquely determined. Similarly, w x — g 2 (A)f 2 (A)v is 
uniquely determined, and the sum is therefore direct. This proves our 
theorem. 

Theorem 4.1 applies as well when / is expressed as a product of sever¬ 
al factors. We state the result over the complex numbers. 


Theorem 4.2. Let V be a vector space over C, and let A : V -► V be an 
operator. Let P(t ) be a polynomial such that P(A) = 0, and let 

P(t) = (t- ai) mi * * * (t - cc r ) mr 

be its factorization , the a l5 ...,a r being the distinct roots. Let W t be the 
kernel of (A — ocf)" 11 . Then V is the direct sum of the subspaces 
W u ... 9 W r . 
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Proof. The proof can be done by induction, splitting off the factors 
(t — a t ) mi , (t — a 2 ) m2 ,.. • ,one by one. Let 

W i = Kernel of (A — ocj)™ 1 , 

W = Kernel of ( A — a 2 /) m2 • • • (A — oc r I) mr . 

By Theorem 4.1 we obtain a direct sum decomposition V=W 1 ®W. 
Now, inductively, we can assume that W is expressed as a direct sum 

w= w 2 ® •••© w r , 

where Wj (j = 2,... ,r) is the kernel of (A — <xjl) mj in W. Then 


V=W 1 ®W 2 ® — ®W r 


is a direct sum. We still have to prove that Wj (j = 2,... ,r) is the kernel 
of (A — <XjI) mj in V. Let 


v = + w 2 H-+ w r 

be an element of V, with WieWi, and such that v is in the kernel of 
(A — otjI) mj . Then in particular, v is in the kernel of 

(A — a 2 1) m2 --(A — a r /) mr , 

whence v must be in W, and consequently w l = 0. Since v lies in W, we 
can now conclude that v = w ; because W is the direct sum of W 2 ,...,W r . 

Example 4. Differential equations. Let V be the space of (infinitely dif¬ 
ferentiable) solutions of the differential equation 

D n f + a n-lD n l f + • • • + a of = 0, 
with constant complex coefficients a t . 

Theorem 4.3 Let 


P(t ) — t n + U n _it n 1 + ••• + Oq. 
Factor P(t) as in Theorem 5.2 


P(t) = (t — a t ) mi • • • (t — a r ) mr . 
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Then V is the direct sum of the spaces of solutions of the differential 
equations 

(D= 0, 

for i = 1,... ,r. 

Proo/ This is merely a direct application of Theorem 4.2. 

Thus the study of the original differential equation is reduced to the 
study of the much simpler equation 

(D - ocI) m f = 0. 

The solutions of this equation are easily found. 


Theorem 4.4 Let a be a complex number. Let W be the space of sol¬ 
utions of the differential equation 


(D - ocI) m f = 0. 


Then W is the space generated by the functions 

e«\ 

and these functions form a basis for this space , which therefore has di¬ 
mension m. 


Proof For any complex a we have 


(D - (xI) m f = e a T> m (£T a '/). 


(The proof is a simple induction.) Consequently, / lies in the kernel of 
(D — a/) m if and only if 


D m {e~ clt f) = 0. 


The only functions whose m-th derivative is 0 are the polynomials of de¬ 
gree ^ m — 1. Hence the space of solutions of ( D — od) m f = 0 is the 
space generated by the functions 


Finally these functions are linearly independent. Suppose we have a 
linear relation 


c 0 e at + c x te M + • • • + c m _ 1 t m 1 e at = 0 
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for all U with constants c 0 ,... 9 c m _ 1 . Let 

Q{t) = c 0 + c t t + ••• + c m _ 1 t m_1 . 

Then Q(t) is a non-zero polynomial, and we have 

Q(t)e at = 0 for all t. 

But e at 7 ^ 0 for all t so Q(t ) = 0 for all t. Since Q is a polynomial, we 
must have c t = 0 for i = 0 ,... ,m — 1 thus concluding the proof. 


XI, §4. EXERCISES 

1. In Theorem 4.1 show that image of ffA) = kernel of f 2 (A). 

2. Let A: V-> V be an operator, and V finite dimensional. Suppose that A 3 = A. 
Show that V is the direct sum 


V= V 0 ® V,® V_ l9 


where V 0 = Ker A, V t is the (+ l)-eigenspace of A, and is the (— l)-ei- 
genspace of A. 

3. Let A : V-> V be an operator, and V finite dimensional. Suppose that the char¬ 
acteristic polynomial of A has the factorization 


P A (0 = (*-«!)•••(*- a n)> 


where are distinct elements of the field K. Show that V has a basis 

consisting of eigenvectors for A. 


XI, §5. SCHUR’S LEMMA 

Let V be a vector space over K , and let S be a set of operators of V. 
Let IT be a subspace of V. We shall say that W is an S-invariant sub¬ 
space if BW is contained in W for all B in S. We shall say that V is a 
simple 5-space if V =£ {0} and if the only S-invariant subspaces are V it¬ 
self and the zero subspace. 

Remark 1. Let A:V-> V be an operator such that AB = BA for all 
BeS. Then the image and kernel of A are S-invariant subspaces of V. 
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Proof. Let w be in the image of A, say w = Av with some veV. Then 
Bw = BAv = ABv. This shows that Bw is also in the image of A, and 
hence that the image of A is S-invariant. Let u be in the kernel of A. 
Then ABu = BAu = 0. Hence Bu is also in the kernel, which is therefore 
an S-invariant subspace. 

Remark 2. Let S be as above , and let A : V -> V be an operator. Assume 
that AB = BA for all BeS. If f is a polynomial in K[t] 9 then f(A)B = 
Bf(A) for all BeS. 

Prove this as a simple exercise. 

Theorem 5.1. Let V be a vector space over K , and let S be a set of 
operators of V. Assume that V is a simple S-space. Let A : V -► V be a 
linear map such that AB = BA for all B in S. Then either A is invert¬ 
ible or A is the zero map. 

Proof. Assume A ^ O. By Remark 1, the kernel of A is {0}, and its 
image is all of V. Hence A is invertible. 

Theorem 5.2. Let V be a finite dimensional vector space over the com¬ 
plex numbers. Let S be a set of operators of V 9 and assume that V is a 
simple S-space. Let A: V -> V be a linear map such that AB = BA for 
all B in S. Then there exists a number X such that A = XL 

Proof. Let J be the ideal of polynomials / in C[t] such that 
f(A) = O. Let g be a generator for this ideal, with leading coefficient 1. 
Then g / 0. We contend that g is irreducible. Otherwise, we can write 
g = h x h 2 with polynomials h l9 h 2 of degrees < deg g. Consequently 
hfA) ^ o. By Theorem 5.1, and Remarks 1, 2 we conclude that h x (A) is 
invertible. Similarly, h 2 (A) is invertible. Hence h 1 (A)h 2 (A) is invertible, 
an impossibility which proves that g must be irreducible. But 
the only irreducible polynomials over the complex numbers are of degree 
1, and hence g(t) = t — X for some XeC. Since g(A) = O , we conclude 
that A — XI = O, whence A = XI , as was to be shown. 


XI, §5. EXERCISES 


1. Let V be a finite dimensional vector space over the field K, and let S be the 
set of all linear maps of V into itself. Show that V is a simple S-space. 


2. Let V = R 2 , let S consist of the matrix 


1 

0 


a 

1 


viewed as linear map of V into 


itself. Here, a is a fixed non-zero real number. Determine all S-invariant sub¬ 
spaces of V. 
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3. Let V be a vector space over the field K , and let {v u ...,v n } be a basis of V. 
For each permutation a of {l,...,w} let A a : V-> V be the linear map such that 

A M) = V a(i) • 

(a) Show that for any two permutations a, t we have 

A a A x = A ax , 

and A id = /. 

(b) Show that the subspace generated by v = v x + ••• + v n is an invariant 
subspace for the set S n consisting of all A a . 

(c) Show that the element v of part (b) is an eigenvector of each A a . What is 
the eigenvalue of A a belonging to v ? 

(d) Let n = 2, and let a be the permutation which is not the identity. Show 
that v 1 — v 2 generates a 1-dimensional subspace which is invariant under 
A a . Show that v 1 — v 2 is an eigenvector of A a . What is the eigenvalue? 

4. Let V be a vector space over the field X, and let A : V -* V be an operator. 
Assume that A r = I for some integer r ^ 1. Let T— I + A + ••• + A r ~ l . Let 
v 0 be an element of V. Show that the space generated by Tv 0 is an invariant 
subspace of A , and that Tv 0 is an eigenvector of A. If Tv 0f ^ O, what is the 
eigenvalue? 

5. Let V be a vector space over the field K , and let S be a set of operators of V. 
Let ( 7 , W be S-invariant subspaces of V. Show that U + W and U nW are 
S-invariant subspaces. 


XI, §6. THE JORDAN NORMAL FORM 

In Chapter X, §1 we proved that a linear map over the complex numbers 
can always be triangularized. This result suffices for many applications, 
but it is possible to improve it and find a basis such that the matrix of 
the linear map has an exceptionally simple triangular form. We do this 
now, using the primary decomposition. 

We first consider a special case, which turns out to be rather typi¬ 
cal afterwards. Let V be a vector space over the complex numbers. Let 
A: V-> V be a linear map. Let aeC and let veV, v ^ O. We shall say 
that v is (A — a/)-cyclic if there exists an integer r ^ 1 such that 
(A — olIJv = O. The smallest positive integer r having this property will 
then be called a period of v relative to A — a/. If r is such a period, then 
we have (A — ocI) k v ^ O for any integer k such that 0 ^ k < r. 

Lemma 6.1. If v ^ O is (A — cd)-cyclic , with period r, then the elements 
v , (A — ccl)v, ... , (A — ctl) r ~ 1 v 


are linearly independent. 
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Proof. Let B = A — oil for simplicity. A relation of linear dependence 
between the above elements can be written 

f(B)v = O, 

where / is a polynomial ^ 0 of degree ^ r — 1, namely 


c 0 v + c x Bv + • • • + c s B s v = O, 


with f(t) = c 0 + c x t + • • • + c s t s , and s ^ r — 1. We also have B r v = O by 
hypothesis. Let g(t ) = t r . If h is the greatest common divisor of / and g , 
then we can write 


h =fj + 9ig. 


where / 1? g x are polynomials, and thus h(B) = f 1 (B)f(B) + g 1 (B)g(B). It 
follows that h(B)v = O. But h(t) divides t r and is of degree ^ r — 1, 
so that h(t) = t d with d < r. This contradicts the hypothesis that r is a 
period of v, and proves the lemma. 

The vector space V will be called cyclic if there exists some number a 
and an element veV which is (A — a/)-cyclic and v 9 Av , ...,A r ~ 1 v generate 
V. If this is the case, then Lemma 6.1 implies that 

(*) {(A-aiy- 1 v,...,(A-aI)v,v} 

is a basis for V. With respect to this basis, the matrix of A is then par¬ 
ticularly simple. Indeed, for each k we have 

A(A - a I) k v = (A- ocI) k + l v + oc(A - a I) k v. 

By definition, it follows that the associated matrix for A with respect to 
this basis is equal to the triangular matrix 

loc 1 0 ••• 0 0\ 

0 a 1 ••• 0 0 

: : ’ * ' : 0 ' 

0 0 0 ••• a 1 | 

lo 0 0 ••• 0 a/ 

This matrix has a on the diagonal, 1 above the diagonal, and 0 every¬ 
where else. The reader will observe that (A — <xl) r ~ 1 v is an eigenvector 
for A, with eigenvalue a. 

The basis (*) is called a Jordan basis for V with respect to A. 

Suppose that V is expressed as a direct sum of ^-invariant subspaces, 

V= V,® ••• ®v m9 
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and suppose that each V t is cyclic. If we select a Jordan basis for each 
V i9 then the sequence of these bases forms a basis for V , again called 
a Jordan basis for V with respect to A. With respect to this basis, the 
matrix for A therefore splits into blocks (Fig. 1). 



In each block we have an eigenvalue a t on the diagonal. We have 1 
above the diagonal, and 0 everywhere else. This matrix is called the Jor¬ 
dan normal form for A. Our main theorem in this section is that this 
normal form can always be achieved, namely: 

Theorem 6.2. Let V be a finite dimensional space over the complex 
numbers , and K/ {O}. Let A: V-> V be an operator. Then V can be 
expressed as a direct sum of A-invariant cyclic subspaces. 

Proof By Theorem 4.2 we may assume without loss of generality 
there exists a number a and an integer r ^ 1 such that (A — a/) r = O. 
Let B = A — a/. Then B r = O. We assume that r is the smallest such in¬ 
teger. Then B r_1 ^ O. The subspace BV is not equal to V because its 
dimension is strictly smaller than that of V. (For instance, there exists 
some we V such that B r_1 w / O. Let v = B r ~ l w. Then Bv = O. Our as¬ 
sertion follows from the dimension relation 

dim BV - b dim Ker B = dim V.) 

By induction, we may write BV as a direct sum of X-invariant (or B-in- 
variant) subspaces which are cyclic, say 


BV = 
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such that Wi has a basis consisting of elements for some cyclic vec¬ 
tor w,e W t of period r t . Let v l g V be such that Bv t = w f . Then each v t is 
a cyclic vector, because 

if B ri w i = 0 , then B ri+l v i = 0. 

Let V x be the subspace of V generated by the elements B k v t for 
k = 0,... ,r f . We contend that the subspace V' equal to the sum 

V ' = Vl +-+V m 

is a direct sum. We have to prove that any element u in this sum can be 
expressed uniquely in the form 

u = u t + ••• + u m , with u t G Vi. 

Any element of V x is of type fi(B)Vi where / f is a polynomial, of degree 
<^r f . Suppose that 

(1) +-1- f m (B)v m = O. 

Applying B and noting that Bf^B) = fi(B)B we get 


+ ••• + f m (B)w m = O. 


But W t -!-•••+ W m is a direct sum decomposition of BV, whence 

fi(B)Wi = 0, all i = 1,... ,m. 

Therefore t ri divides f£t\ and in particular t divides f£t). We can thus 
write 

fi(t) = gMt 


for some polynomial g h and hence fi(B) = gi(B)B. It follows from (1) 
that 

+ ••• + = O. 


Again, t ri divides g^t), whence t n+1 divides f£t), and therefore 
fi(B)Vi = O. This proves what we wanted, namely that V' is a direct 
sum of V l9 ...,V m . 

From the construction of V' we observe that BV' = BV, because any 
element in BV is of the form 


/ 1 (£)w 1 + ...+/ m (£)vv i 
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with some polynomials f h and is therefore the image under B of the ele¬ 
ment 


H-+ 


which lies in V'. From this we shall conclude that 


V= V' + Ker B. 


Indeed, let veV. Then Bv = Bv' for some v' e V\ and hence 
B(v - v') = O. Thus 


v = v' + (v — i/), 


thus proving that V = V' + Ker B. Of course this sum is not direct. 
However, let be a Jordan basis of V'. We can extend to a basis of 
V by using elements of Ker B. Namely, if {u l ,...,u s } is a basis of Ker B, 
then 

, Wjj, . . . 

is a basis of V for suitable indices . Each i/, satisfies Buj = O, 

whence Uj is an eigenvector for X, and the one-dimensional space gener¬ 
ated by m 7 is X-invariant, and cyclic. We let this subspace be denoted by 
Uj. Then we have 


V= V'®U h ® ... ®U jt 
= © .*• © V m © U jx © ... © U Jl9 

thus giving the desired expression of V as a direct sum of cyclic sub¬ 
spaces. This proves our theorem. 


XI, §6. EXERCISES 

In the following exercises, we let V be a finite dimensional vector space over the 
complex numbers, and we let A: V-+ V be an operator. 

1. Show that A can be written in the form A = D + N, where D is a diagonaliz- 
able operator, N is a nilpotent operator, and DN = ND. 

2. Assume that V is cyclic. Show that the subspace of V generated by eigenvec¬ 
tors of A is one-dimensional. 
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3. Assume that V is cyclic. Let / be a polynomial. What are the eigenvalues of 
f(A) in terms of those of A1 Same question when V is not assumed cyclic. 

4. If A is nilpotent and not O, show that A is not diagonalizable. 

5. Let P A be the characteristic polynomial of A, and write it as a product 


p A (t) = n (* - ««r. 

i = 1 

where <x h ... ,a r are distinct. Let / be a polynomial. Express the characteristic 
polynomial P f(A) as a product of factors of degree 1. 

A direct sum decomposition of matrices 

6. Let Mat„(C) be the vector space of n x n complex matrices. Let Ey for 
i,y = l,...,/! be the matrix with (ij )-component 1, and all other components 
0. Then the set of elements Ey is a basis for Mat„(C). Let D* be the set of 
diagonal matrices with non-zero diagonal components. We write such a matrix 
as diag(«i,... ,a n ) = a. We define the conjugation action of D* on Mat„(C) 
by 

c(a)X = aXa~'. 

(a) Show that a i-> c (a) is a map from O' into the automorphisms of Mat„(C) 
(isomorphisms of Mat„(C) with itself), satisfying 

c(7) = id, c (ab) = c(a)c(b) and c(a _1 ) = c(a) -1 . 

A map satisfying these conditions is called a homomorphism. 

(b) Show that each Ey is an eigenvector for the action of c(a), the eigenvalue 
being given by Xy(?) = at/dj. 

Thus Mat„(C) is a direct sum of eigenspaces. Each Xy- D* C* is a homo¬ 
morphism of D* into the multiplicative group of complex numbers. 

7. For two matrices X, Y e Mat„(C), define [. X , Y] = XY - YX. Let L x denote the 
map such that L X (Y) = [. X , T]. One calls L x the bracket (or regular or Lie) 
action of X. 

(a) Show that for each X, the map L x : Y i-> [. X , Y) is a linear map, satisfying 
the Leibniz rule for derivations, that is [X, [T,Z]] = [[. X , Y],Z\ + [Y, [X,Z]]. 

(b) Let D be the vector space of diagonal matrices. For each H e D, show that 
Ey is an eigenvector of Lh, with eigenvalue ocy(H) = hi — hj (if /q,... ,/*„ are 
the diagonal components of H). Show that a y\ D —> C is linear. It is called 
an eigencharacter of the bracket action. 

(c) For two linear maps A,B of a vector space V into itself, define 


[A,B\ = AB - BA. 


Show that L[x, y] — Ey\- 
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XII, §1. DEFINITIONS 

Let S be a subset of R m . We say that S is convex if given points P , Q in 
S , the line segment joining P to Q is also contained in S. 

We recall that the line segment joining P to Q is the set of all points 
P + t(Q — P) with 0 ^ t ^ 1. Thus it is the set of points 

(1 - t)P + tQ, 

with 0 ^ r g 1. 

Theorem 1.1. Let P 1 ,...,P n be points of R m . The set of all linear com¬ 
binations 

x iP i + • + x n P n 

with 0 ^ x t ^ 1 and x x + • • • + x n = 1, is a convex set. 

Theorem 1.2. Let P l9 ... 9 P n be points of R m . Any convex set which 
contains P l9 ... 9 P n also contains all linear combinations 

x i P i+* + X n P n , 

such that 0 ^ x t ^ 1 for all i , and x x + • • • + = 1. 

Either work out the proofs as an exercise or look them up in Chapter 

HI, §5. 
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In view of Theorems 1.1 and 1.2, we conclude that the set of linear 
combinations described in these theorems is the smallest convex set con¬ 
taining all points P l 9 ... 9 P n . 

The following statements have already occurred as exercises, and we 
recall them here for the sake of completeness. 

(1) If S and S' are convex sets , then the intersection SnS' is convex. 

(2) Let F: R m -* R” be a linear map. If S is convex in R m , then F(S) 

(the image of S under F) is convex in R”. 

(3) Let F: R m -> R” be a linear map. Let S' be a convex set of R”. 

Let S = F~ 1 (S') be the set of all XeR m such that F(X ) lies in S'. 

Then S is convex. 

Examples. Let A be a vector in R”. The map F such that F(X ) = A X 
is linear. Note that a point ce R is a convex set. Hence the hyperplane 
H consisting of all X such that A • X = c is convex. 

Furthermore, the set S' of all xeR such that x > c is convex. Hence 
the set of all X eR n such that A • X > c is convex. It is called an open 
half space. Similarly, the set of points XeR” such that A • X ^ c is called 
a closed half space. 

In the following picture, we have illustrated a hyperplane (line) in R 2 , 
and one half space determined by it. 



The line is defined by the equation 3x — 2y = — 1. It passes through the 
point P = (l,2), and N = (3, —2) is a vector perpendicular to the line. 
We have shaded the half space of points X such that X-N ^ —1. 

We see that a hyperplane whose equation is X • N = c determines two 
closed half spaces, namely the spaces defined by the equations 

X • N ^ c and X • N ^ c, 
and similarly for the open half spaces. 
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Since the intersection of convex sets is convex, the intersection of a 
finite number of half spaces is convex. In the next picture (Figs. 2 and 
3), we have drawn intersections of a finite number of half planes. Such 
an intersection can be bounded or unbounded. (We recall that a subset 
S of R" is said to be bounded if there exists a number c > 0 such that 
||JSf|| for all XeS.) 



Figure 2 



Figure 3 


XII, §2. SEPARATING HYPERPLANES 

Theorem 2.1. Let S be a closed convex set in R n . Let P be a point of 
R”. Then either P belongs to S, or there exists a hyper plane H which 
contains P , and such that S is contained in one of the open half spaces 
determined by H. 


Proof. We use a fact from calculus. Suppose that P does not belong 
to S. We consider the function / on the closed set S given by 

f(X) = \\X - PI 

It is proved in a course in calculus (with c and 3) that this function has 
a minimum on S. Let Q be a point of S such that 


for all X in S. Let 


lie-^ ii*-p 

N = Q — P. 


Since P is not in S, Q — P ^ O, and N O. We contend that the hyper¬ 
plane passing through P , perpendicular to N, will satisfy our require¬ 
ments. Let Q' be any point of S, and say Q' ^ Q. Then for every t with 
0 < t ^ 1 we have 


ne - p ii ^ no + m - e) - p n = iko - p) + w - Q) «■ 



[XII, §2] 


SEPARATING HYPERPLANES 


271 


Squaring gives 

(Q - P) 2 ^(Q- P ) 2 + 2 t(Q - P ) • (Q -Q) + t 2 (Q' - Q) 2 . 
Canceling and dividing by t, we obtain 

0^2 (Q-P)-(Q' -Q) + t(Q' -Q) 2 . 

Letting t tend to 0 yields 

^N(Q'-P) + N(P-Q) 

^ N (Q' - P)- N N. 

But N • N > 0. Hence 

Q'-N > P-N. 

This proves that S is contained in the open half space defined by 
X N > P • N. 


Let S be a convex set in R". Then the closure of S ( denoted by S) is 
convex. 

This is easily proved, for if P , Q are points in the closure, we can find 
points of S, say P k , Q k tending to P and Q respectively as a limit. Then 
for 0 g f g 1, 

tP k + (1 — t)Q k 

tends to tP + (1 — t)Q , which therefore lies in the closure of S. 

Let S be a convex set in R”. Let P be a boundary point of S. (This 
means a point such that for every e > 0, the open ball centered at P , of 
radius e in R" contains points which are in S, and points which are not 
in S.) A hyperplane H is said to be a supporting hyperplane of 5 at P if 
P is contained in if, and if S is contained in one of the two closed half 
spaces determined by H. 

Theorem 2.2. Let S be a convex set in R", and let P be a boundary 
point of S. Then there exists a supporting hyper plane of S at P. 

Proof. Let S be the closure of S. Then we saw that S is convex, and 
P is a boundary point of S. If we can prove our theorem for S, then it 
certainly follows for S. Thus without loss of generality, we may assume 
that S is closed. 
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For each integer k > 2, we can find a point P k not in S, but at dis¬ 
tance <l/k from P. By Theorem 2.1, we find a point Q k on S whose 
distance from P k is minimal, and we let N k = Q k — P k . Let N k be the 
vector in the same direction as N k but of norm 1. The sequence of vec¬ 
tors N' k has a point of accumulation on the sphere of radius 1, say AT, 
because the sphere is compact. We have by Theorem 2.1, for all XgS, 

X-N k ^P k -N k 

for every /c, whence dividing each side by the norm of N k , we get 


X.N' k >P k -N' k 


for every k. Since N' is a point of accumulation of {N' k }, and since P is 
a limit of {P fc }, it follows by continuity that for each X in S, 

XN'^PN'. 

This proves our theorem. 

Remark. Let S be a convex set, and let H be a hyperplane defined by 
an equation 

X-N = a. 


Assume that for all X e S we have X N ^ a. If P is a point of S lying in 
the hyperplane, then P is a boundary point of S. Otherwise, for e > 0 
and e sufficiently small, P — eN would be a point of S, and thus 

(P - eN)- N = P N - eN-N = a - eN • N < a, 

contrary to hypothesis. We conclude therefore that H is a supporting 
hyperplane of S at P. 


XII, §3. EXTREME POINTS AND SUPPORTING 
HYPERPLANES 

Let S be a convex set and let P be a point of S. We shall say that P 
is an extreme point of S if there do not exist points Q u Q 2 of S with 
Qi ^ Q 2 suc h that P can be written in the form 

P = tQ t 4- (1 — t)Q 2 with 0 < t < 1. 

In other words, P cannot lie on a line segment contained in S unless it is 
one of the end-points of the line segment. 
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Theorem 3.1. Let S be a closed convex set which is bounded. Then 
every supporting hyperplane of S contains an extreme point. 

Proof. Let if be a supporting hyperplane, defined by the equation 
X N = P 0 N at a boundary point T 0 , and say X • N ^ P 0 • N for all 
X e S. Let T be the intersection of S and the hyperplane. Then T is 
convex, closed, bounded. We contend that an extreme point of T will 
also be an extreme point of S. This will reduce our problem to finding 
extreme points of T. To prove our contention let P be an extreme point 
of T, and suppose that we can write 

P = tQ x + (1 - t)Q 2 , 0 < t < 1. 

Dotting with N , and using the fact that P is in the hyperplane, hence 
p. N = P 0 • N, we obtain 


( 1 ) 


P 0 -N = tQ x -N + (1 -t)Q 2 N. 


We have Q 1 • N and Q 2 N7tP 0 -N since Q u Q 2 lie in S. If one of these 
is > P 0 ’N, say Q x - N > P 0 N , then the right-hand side of equation (1) 
is 


> tP 0 • N + (1 — t)P 0 ■ N = P 0 - N, 


and this is impossible. Hence both Q l9 Q 2 lie in the hyperplane, thereby 
contradicting the hypothesis that P is an extreme point of T. 

We shall now find an extreme point of T. Among all points of T, 
there is at least one point whose first coordinate is smallest, because T is 
closed and bounded. (We project on the first coordinate. The image 
of T under this projection has a greatest lower bound which is taken 
on by an element of T since T is closed.) Let T x be the subset of T 
consisting of all points whose first coordinate is equal to this smallest 
one. Then Ti is closed, and bounded. Hence we can find a point 
of T x whose second coordinate is smallest among all points of T 1? 
and the set T 2 of all points of Ti having this second coordinate 
is closed and bounded. We may proceed in this way until we 
find a point P of T having successively smallest first, second,... ,n-th 
coordinate. We assert that P is an extreme point of T. Let 
P = (p u ...,p„). 

Suppose that we can write 


P = tX + (1 — t)T, 0 < t < 1, 
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and points X = (x u ...,x„), Y = (y l9 ... ,y„) in T. Then x 1 and y 1 ^p 1 , 
and 

Pi = + (1 - t)y v 


If x x or y x > p l9 then 

tx 1 + (1 - t)y x > tp 1 + (1 - OPi = Pi, 

which is impossible. Hence x 1 = y 1 = p 1 . Proceeding inductively, sup¬ 
pose we have proved x f = y t — p t for i — 1,... ,r. Then if r < n, 

p r+1 = tx r+1 + (1 - t)y r+l9 

and we may repeat the preceding argument. It follows that 

X = Y = P, 

whence P is an extreme point, and our theorem is proved. 


XII, §4. THE KREIN-MILMAN THEOREM 

Let E be a set of points in R" (with at least one point in it). We wish to 
describe the smallest convex set containing E. We may say that it is the 
intersection of all convex sets containing £, because this intersection is 
convex, and is clearly smallest. 

We can also describe this smallest convex set in another way. Let E c 
be the set of all linear combinations 


tiPi + — + * m P m 

of points P l9 ... ,P m in E with real coefficients t { such that 

0 ^ ti ^ 1 and t m — 1 . 

Then the set E c is convex. We leave the trivial verification to the reader. 
Any convex set containing E must contain £ c , and hence E c is the smal¬ 
lest convex set containing E. We call E c the convex closure of E. 

Let S be a convex set and let E be the set of its extreme points. Then 
E c is contained in S. We ask for conditions under which E c — S. 
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Geometrically speaking, extreme points can be either points like those 
on the shell of an egg, or like points at the vertices of a polygon, viz.: 




Figure 4 


Figure 5 


An unbounded convex set need not be the convex closure of its ex¬ 
treme points, for instance the closed upper half plane, which has no ex¬ 
treme points. Also, an open convex set need not be the convex closure 
of its extreme points (the interior of the egg has no extreme points). The 
Krein-Milman theorem states that if we eliminate these two possibilities, 
then no other troubles can occur. 


Theorem 4.1. Let S be a closed , bounded , convex set. Then S is the 
smallest closed convex set containing the extreme points. 


Proof. Let S' be the intersection of all closed convex sets containing 
the extreme points of S. Then S' c= S. We must show that S is con¬ 
tained in S'. Let P e S, and suppose P£S'. By Theorem 2.1, there 
exists a hyperplane H passing through P, defined by an equation 

X-N = c, 

such that X • N > c for all X eS'. Let L: R" R be the linear map such 
that L(X) = X N. Then L(P) = c, and L(P) is not contained in L(S'). 
Since S is closed and bounded, the image L(S) is closed and bounded, 
and this image is also convex. Hence L(S) is a closed interval, say [ a , fe], 
containing c. Thus a ^ c ^ b. Let H a be the hyperplane defined by the 
equation 

X - N = a. 


By the remark following Theorem 2.2, we know that H a is a supporting 
hyperplane of S. By Theorem 3.1, we conclude that H a contains an 
extreme point of S. This extreme point is in S'. We then obtain a con¬ 
tradiction of the fact that X • N > c ^ a for all X in S', and thus prove 
the Krein-Milman theorem. 
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XII, §4. EXERCISES 

1. Let A be a vector in R". Let F:R"->R" be the translation, 

F(X ) = X + A. 

Show that if S is convex in R” then F(S ) is also convex. 

2. Let c be a number > 0, and let P be a point in R". Let S be the set of 
points X such that ||X — P\\ < c. Show that S is convex. Similarly, show that 
the set of points X such that \\X — P\\ ^ c is convex. 

3. Sketch the convex closure of the following sets of points. 

(a) (1, 2), (1, -1), (1, 3), (-1, 1) 

(b) (-1, 2), (2, 3), (-1, -1), (1, 0) 

4. Let L: R" -► R" be an invertible linear map. Let S be convex in R" and P an 
extreme point of S. Show that L(P) is an extreme point of L(S). Is the asser¬ 
tion still true if L is not invertible? 

5. Prove that the intersection of a finite number of closed half spaces in R" can 
have only a finite number of extreme points. 

6. Let B be a column vector in R", and A an n x n matrix. Show that the set of 
solutions of the linear equations AX = B is a convex set in R". 
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The complex numbers are a set of objects which can be added and 
multiplied, the sum and product of two complex numbers being also a 
complex number, and satisfy the following conditions. 

(1) Every real number is a complex number, and if a, p are real 
numbers, then their sum and product as complex numbers are 
the same as their sum and product as real numbers. 

(2) There is a complex number denoted by i such that i 2 = —1. 

(3) Every complex number can be written uniquely in the form 
a + bi where a , b are real numbers. 

(4) The ordinary laws of arithmetic concerning addition and multipli¬ 
cation are satisfied. We list these laws: 

If a, /?, y are complex numbers, then 

(a/?)y = a(/?y) and (a -1- P) + y = a + (p + y). 

We have a(/? + y) = cap + ay, and {p + y)a = pa + ya. 

We have ocp = pa, and a + p = p + a. 

If 1 is the real number one, then la = a. 

If 0 is the real number zero, then Oa = 0. 

We have a + (— l)a = 0. 

We shall now draw consequences of these properties. With each 
complex number a + bi , we associate the vector ( a , b ) in the plane. Let 
a = a x + a 2 i and p = b 1 + b 2 i be two complex numbers. Then 


a + p = + b l + (fl 2 T b 2 )i. 
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Hence addition of complex numbers is carried out “componentwise” and 
corresponds to addition of vectors in the plane. For example, 

(2 + 3 1 ) + (— 1 + 5i) = 1 + 8i. 

In multiplying complex numbers, we use the rule i 2 = — 1 to simplify 
a product and to put it in the form a + bi. For instance, let a = 2 + 3i 
and p = 1 — i. Then 

olP = (2 4- 3i)(l - i) = 2(1 - 0 + 3/(1 - 0 

= 2-2i + 3 i - 3 i 2 
= 2 + i — 3(— 1) 

=2+3+/ 

= 5 + i. 

Let a = a + bi be a complex number. We define a to be a — bi. Thus 
if a = 2 + 3i, then a = 2 — 3i. The complex number a is called the 
conjugate of a. We see at once that 

aa = a 2 + b 2 . 

With the vector interpretation of complex numbers, we see that aa is the 
square of the distance of the point (a, b) from the origin. 

We now have one more important property of complex numbers, 
which will allow us to divide by complex numbers other than 0. 

If a = a + bi is a complex number ^ 0, and if we let 

a = * 

a 2 + b 2 

then vlX = Xol = 1. 

The proof of this property is an immediate consequence of the law of 
multiplication of complex numbers, because 

a aa 

^a^+b 2 = a 2 Tti 2 = L 

The number X above is called the inverse of a, and is denoted by a -1 or 
1/a. If a, P are complex numbers, we often write p/a instead of a -1 /? (or 
/?a _1 ), just as we did with real numbers. We see that we can divide by 
complex numbers ^ 0. 
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We define the absolute value of a complex number a = a t + ia 2 to be 

M = + of. 

This absolute value is none other than the norm of the vector (a u a 2 ). 
In terms of absolute values, we can write 



provided a^O. 

The triangle inequality for the norm of vectors can now be stated for 
complex numbers. If a, /? are complex numbers, then 

|a + 0|£|a| + |/8|. 

Another property of the absolute value is given in Exercise 5. 

Using some elementary facts of analysis, we shall now prove: 

Theorem. The complex numbers are algebraically closed , in other words , 
every polynomial /eC[r] of degree ^ 1 has a root in C. 


Proof We may write 

/(0 = a n t n + a n _ x t n 1 + • • • + a 0 
with a n / 0. For every real R > 0, the function |/| such that 

t |— ^ 1/(01 

is continuous on the closed disc of radius R , and hence has a minimum 
value on this disc. On the other hand, from the expression 


/(0 = a n n 1 + 


a n-1 

a n t 


+ ••• + 


ftp \ 

a n t n ) 


we see that when \t\ becomes large, then \f(t)\ also becomes large, i.e. 
given C > 0 there exists R > 0 such that if \t\ > R then \f(t)\ > C. Con¬ 
sequently, there exists a positive number R 0 such that, if z 0 is a mini¬ 
mum point of |/| on the closed disc of radius R 0 , then 


1/(01 ^ l/(z 0 )l 


for all complex numbers t. In other words, z 0 is an absolute minimum 
for |/|. We shall prove that f(z 0 ) = 0. 
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We express / in the form 


fit) = C 0 - 1 - c x {t - z 0 ) + • • • + C n {t - z o y 


with constants c { . (We did it in the text, but one also sees it by writing 
t = z 0 + (t — z 0 ) and substituting directly in /(£) ) If / (z 0 ) ^ 0, then 
c o =/(z o )^0. Let z = t — z 0 , and let m be the smallest integer >0 
such that c m ^ 0. This integer m exists because / is assumed to have 
degree ^ 1. Then we can write 


f(t)=f 1 (z) = c 0 + c m z"' + z m+1 g(z) 

for some polynomial g , and some polynomial f x (obtained from / by 
changing the variable). Let z 1 be a complex number such that 


Z™ = -c 0 /c m9 

and consider values of z of type 


z = Xz l9 


where X is real, 0 ^ X ^ 1. We have 


f(t)=f 1 (Xz 1 ) = c 0 - X m c 0 + X m+1 z^ + 1 g(Xz 1 ) 

= c 0 [i -r + r + 1 zT + 1 c 0 - 1 ^z 1 )]. 


There exists a number C > 0 such that for all X with 0 ^ X ^ 1 we have 
\z t i +1 Co 1 g(Xz 1 )\ ^ C, and hence 

\fi(X Zi )\ ^ |c 0 |(l - X m + CX m+1 ). 

If we can now prove that for sufficiently small X with 0 < X < 1 we have 

0 < 1 - X m + CX m+1 < 1, 

then for such X we get | f x iXz^)\ < |c 0 |, thereby contradicting the hypoth¬ 
esis that \f(z 0 )\ ^ |/(£)| for all complex numbers t. The left inequality is 
of course obvious since 0 < X < 1. The right inequality amounts to 
CX m+1 < X m , or equivalently CX < 1, which is certainly satisfied for suffi¬ 
ciently small X. This concludes the proof. 
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APP. EXERCISES 

1. Express the following complex numbers in the form x + iy. where x, y are real 
numbers. 

(a) ( — 1 + Hy 1 (b)(l +0(1-0 

(c) (1 + i)i(2 - 0 (d) ((-1X2-0 

(e) (7 + ni)(n + i) (f) (2 i + l)7ii 

(g) G/2 + i)(n + 30 (h) (i + l)(i - 2)(i + 3) 

2. Express the following complex numbers in the form x + iy , where x, y are real 

numbers. 


(a) (1 +/r‘ 

1 

(b) y-- 
3 + l 

2 + i 
(C) 2 -, 

(d) 

1 + i 
(e) — 

l 

(0 l + ( 

2i 

(g) T-: 

3 — i 

(h) 


3. Let a be a complex number ^ 0. What is the absolute value of a/a? What is 
a? 

4. Let a, p be two complex numbers. Show that a/? = a/? and that 

a + P = a + p. 


5. Show that |a/?| = |a| \P\. 

6. Define addition of n-tuples of complex numbers componentwise, and multipli¬ 
cation of n-tuples of complex numbers by complex numbers componentwise 
also. If A = (a 1? ... ,a„) and B = (p l9 ... ,P n ) are n-tuples of complex numbers, 
define their product (A, 13) to be 


«Ji + ■■■ + 


(note the complex conjugation!). Prove the following rules: 

HP 1. (A, B} = <J8, Ay. 

HP 2. (A, B + cy = < A , By + <4, c>. 

HP 3. // a is a complex number , t/ien 

<a/l, J8> = a<i4, and </4, aB> = a</l, B>. 

HP 4. If A = O then (A, Ay = 0, and otherwise </i, Ay > 0. 

7. We assume that you know about the functions sine and cosine, and their 
addition formulas. Let 6 be a real number. 

(a) Define 

e ld = cos 6 -l- i sin 9. 

Show that if 9 V and 0 2 are real numbers, then 

^ 1 ( 01 + 02 ) _ ^ 101^102 
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Show that any complex number of absolute value 1 can be written in the 
form e lt for some real number t. 

(b) Show that any complex number can be written in the form re 10 for some 
real numbers r, 6 with r ^ 0. 

(c) If Zj = r Y e i0i and z 2 = r 2 e i02 with real r 1? r 2 ^ 0 and real 0 ls 0 2 , show that 


Z 1 Z 2 = >*,/*,£ 


1 ' 2 ' 




(d) If z is a complex number, and n an integer > 0, show that there exists a 
complex number w such that w" = z. If z ^ 0 show that there exists n dis¬ 
tinct such complex numbers w. [Hint: If z — re 10 , consider first r i; V 0/ ".] 

8. Assuming the complex numbers algebraically closed, prove that every ir¬ 
reducible polynomial over the real numbers has degree 1 or 2. [Hint: Split the 
polynomial over the complex numbers and pair off complex conjugate roots.] 
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Let SL n denote the set of matrices with determinant 1. The purpose of this 
appendix is to formulate in some general terms results about SL n . We shall 
use the language of group theory, which has not been used previously, so 
we have to start with the definition of a group. 

Let G be a set. We are given a mapping G x G —> G, which at first we 
write as a product, i.e. to each pair of elements (x, y) of G we associate an 
element of G denoted by xy, satisfying the following axioms. 

GR 1. The product is associative, namely for all x,y,zeG we have 

{xy)z = x{yz). 

GR 2. There is an element e e G such that ex = xe = x for all xe G. 
GR 3. Given x e G there exists an element x~ l e G such that 

xx -1 = x -1 x = e. 

It is an easy exercise to show that the element in GR 2 is uniquely 
determined, and it is called the unit element. The element x -1 in GR 3 is 
also easily shown to be uniquely determined, and is called the inverse of 
x. A set together with a mapping satisfying the three axioms is called a 

group. 

Example. Let G = SL n (R). Let the product be the multiplication of 
matrices. Then SL n ( R) is a group. Similarly, SL n ( C) is a group. The unit 
element is the unit matrix I. 
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Example. Let G be a group and let H be a subset which contains the 
unit element, and is closed under taking products and inverses, i.e. if 
x,y e H then x~ l e H and xy e H. Then H is a group under the “same” 
product as in G, and is called a subgroup. We shall now consider some 
important subgroups. 

Let G = SL n ( R). Note that the subset consisting of the two elements 
/, -I is a subgroup. Also note that SL n ( R) is a subgroup of the group 
GL n ( R) (all real matrices with non-zero determinant). 

We shall now express Theorem 2.1 of Chapter V in the context of 
groups and subgroups. Let: 

U = subgroup of upper triangular matrices with l’s on the diagonal, 
/I X\ 2 • • X\ n \ 

0 1 X7„\ 


u(x) = 


called unipotent. 


0 0 


A = subgroup of positive diagonal elements: 


with di > 0 for all /. 


K = subgroup of real unitary matrices k, satisfying l k — k 1 . 

Theorem 1 (Iwasawa decomposition). The product map U x A x K —> G 

given by 

(w, a, k) i—► uak 

is a bijection. 


Proof. Let e\,...,e n be the standard unit vectors of R" (vertical). Let 
g = (gtj) e G. Then we have 
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There exists an upper triangular matrix B 
such that 

b\\g {X) 

bi 2 g {l) -\-b 22 g {2) 
b\jg {l) + b 2 j g {2) +-h bjjg {j) = ej 


b\ n g^+b 2 n g^+ ••• + b nn g^ — e' n , 

such that the diagonal elements are positive, that is b \\,... ,b nn > 0, and 
such that the vectors e [,..., e' n are mutually perpendicular unit vectors. 
Getting such a matrix B is merely applying the usual Gram Schmidt 
orthogonalization process, subtracting a linear combination of previous 
vectors to get orthogonality, and then dividing by the norms to get unit 
vectors. Thus 


= (by), so with by = 0 if i > j. 



e) = E byg (i) = E E 0 gibye q 


1=1 


1=1 < 7=1 


n n 


S gqibij^q- 

</=l i'=l 


Let gB = k e K. Then ke t = ej, so k maps the orthogonal unit vectors 
e\,...,e n to the orthogonal unit vectors Therefore k is unitary, 

and g = kB~ x . Then 

g~ l = Bk~ l and B = au 

where a is the diagonal matrix with a t = bn and u is unipotent, u = a~ l B. 
This proves the surjection G — UAK. For uniqueness of the decompo¬ 
sition, if g — uak — u'a'k' , let u\ — u~ x u\ so using g l g you get a 2t u\ x — 
u\a a . These matrices are lower and upper triangular respectively, with 
diagonals a 2 ,a a , so a = a f , and finally u\ = /, proving uniqueness. 

The elements of U are called unipotent because they are of the form 

u(X) = 1- hi, 

where X is strictly upper triangular, and X n+l = 0. Thus X — u- I is 
called nilpotent. Let 


00 Y J 

exp Y = E-t- 

7=0 ]■ 


and 


OO . yi 

io g (/ + ^) = E(-i)^-- 

i=l 1 
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Let rt denote the space of all strictly upper triangular matrices. Then 

exp: n —> U, Y exp Y 

is a bijection, whose inverse is given by the log series, Y = log(/ + ^). 
Note that, because of the nilpotency, the exp and log series are actually 
polynomials, defining inverse polynomial mappings between U and rt. The 
bijection actually holds over any field of characteristic 0. The relations 

exp log(7 + X)=I + X and log exp Y = log(7 + X) = Y 

hold as identities of formal power series. Cf. my Complex Analysis , 
Chapter II, §3, Exercise 2. 

Geometric interpretation in dimension 2 

Let I 12 be the upper half plane of complex numbers z = x + iy with 

x, y e R and y > 0, y = y(z). For 

define 

g(z) = (az + b)(cz + d)~ l . 

Then G acts on I 12 , meaning that the following two conditions are satisfied: 

If / is the unit matrix, then I(z) = z for all z. 

For g,g' eG we have g(g'(z)) = ( gg'){z ). 

Also note the property: 

If g(z) = z for all z, then g = + /. 

To see that if z e I 12 then g(z) e h 2 also, you will need to check the 
transformation formula 

y(g{z))=-A*), 

\cz + d\ 

proved by direct computation. 

These statements are proved by (easy) brute force. In addition, for 
w e I 12 , let G w be the subset of elements g e G such that g(w) = w. Then G w 
is a subgroup of G , called the isotropy group of w. Verify that: 

Theorem 2. The isotropy group of i is K, i.e. K is the subgroup of 
elements k e G such that A:(i) = i. This is the group of matrices 

( cos 6 sin 6 
\— sin 0 cos 6 

Or equivalently , a = d, c — —b , a 1 + b 2 = 1. 
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For x e R and a\ > 0, let 

“ <x> = (o 0 and ““(o a 2 ) wi,h = 

If g = uak , then w(x)(z) = z + x, so putting j = a\ , we get a(i) — ji, 

g(i) = wtf/r(i) = wa(i) = _yi + x = x + iy. 

Thus G acts transitively, and we have a description of the action in terms 
of the Iwasawa decomposition and the coordinates of the upper half plane. 

Geometric interpretation in dimension 3. 

We hope you know the quaternions, whose elements are 

Z = X1 + *2i-b *3j + X 4 k with Xj,X2,X3,X4 G R 

and i 2 = j 2 = k 2 = -1, ij = k, jk = i, ki = j. Define 


Z = X 1 — X2i — X3j — X4k. 

Then 

ZZ = x 2 + x\ + x\ + X 4 , 
and we define |z| = (zz) 1//2 . 

Let h 3 be the upper half space consisting of elements z whose k- 
component is 0 , and X 3 > 0 , so we write 

z = x\ + X 2 i + with y > 0 . 

Let G — 5 L 2 (C), so elements of G are matrices 

with a,b,c,d e C and ad — be = 1. 

As in the case of I 12 , define 

g(z) = (az + b)(cz + d )~ l . 

Verify by brute force that if z e h 3 then g(z) e h 3 , and that G acts on I 13 , 
namely the two properties listed in the previous example are also satisfied 
here. Since the quaternions are not commutative, we have to use the 
quotient as written (az + b)(cz + d)~ l . Also note that the ^-coordinate 
transformation formula for z e I 13 reads the same as for I 12 , namely 

7 ( 0 ( 2 )) = y(z)/\cz + d\ 2 . 


/a b 
\c d 
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The group G = SL, 2 (C) has the Iwasawa decomposition 


G = UAK, 

where: 

U — group of elements u(x ) = (o 1 ) with xeC; 

A = same group as before in the case of SZ^R); 

K = complex unitary group of elements k such that t k — k~ x . 

The previous proof works the same way, BUT you can verify directly: 

Theorem 3. The isotropy group Gj is K. 

If g = uak with u e U, a e A, k e K, u = u(x) and y — y(a ), then 

0(j) = * + y\- 

Thus G acts transitively, and the Iwasawa decomposition follows trivially 
from this group action (see below). Thus the orthogonalization type proof 
can be completely avoided. 

Proof of the Iwasawa decomposition from the above two properties. Let 
g e G and g( j) = a + y\. Let u = u(x) and a be such that y = a\/ai — a\. 
Let g' = ua. Then by the second property, we get g( j) = gr'(j), so j = 
g~ l g'{\). By the first property, we get g~ l g' = k for some k e K, so 

g'k~ l = uak~ l = g, 

concluding the proof. 

The conjugation action 

By a homomorphism f:G-^G r of a group into another we mean a 
mapping which satisfies the properties f{eo) — fi^G') (where e = unit ele¬ 
ment), and 

f{gm) =f(g\)f{gi) for all g h g 2 eG. 

A homomorphism is called an isomorphism if it has an inverse homo¬ 
morphism, i.e. if there exists a homomorphism /': G' —> G such that ff — 
ides and ff = idc- An isomorphism of G with itself is called an auto¬ 
morphism of G. You can verify at once that the set of automorphisms of 
G, denoted by Aut(G), is a group. The product in this group is the com¬ 
position of mappings. Note that a bijective homomorphism is an iso¬ 
morphism, just as for linear maps. 

Let X be a set. A bijective map < 7 : X —> X of X with itself is called a 
permutation. You can verify at once that the set of permutations of X is 
a group, denoted by Perm(Y). By an action of a group G on X we mean a 
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map 

G x X —> X denoted by (g, x ) i-> gx , 
satisfying the two properties: 

If e is the unit element of G, then ex — x for all x e X. 

For all g { ,g 2 eG and xel we have g\(g 2 x) = (g\gi)x. 

This is just a general formulation of action, of which we have seen an 
example above. Given g e G, the map x gx of X into itself is a per¬ 
mutation of X. You can verify this directly from the definition, namely the 
inverse permutation is given by x i—► g~ x x. Let <r(g) denote the permutation 
associated with g. Then you can also verify directly from the definition 
that 

g •-* <?(g) 

is a homomorphism of G into the group of permutations of X. Conversely, 
such a homomorphism gives rise to an action of G on X. 

Let G be a group. The conjugation action of G on itself is defined for 

g,g' eG by 

c (g)g' = gg'g~ l - 

It is immediately verified that the map g i—► c (g) is a homomorphism of G 
into Aut(G) (the group of automorphisms of G). Then G also acts on 
spaces naturally associated to G. 

Consider the special case when G = SL n ( R). Let 

a = vector space of diagonal matrices diag(/zj,..., h„) with trace 0, 

T, h i = 

n = vector space of strictly upper triangular matrices (hy) with hy = 0 if 

i ^ j- 

l n = vector space of strictly lower diagonal matrices, 
g = vector space of n x n matrices of trace 0. 

Then g is the direct sum a + n + *n, and A acts by conjugation. In fact, g 
is a direct sum of eigenspaces for this action. Indeed, let Ey (,i < j) be the 
matrix with //-component 1 and all other components 0. Then 

c {a)Ey = (aij aj)Ey = Ey 

by direct computation, defining a* lJ — a,-/ ay. Thus oty is a homomorphism 
of A into R + (positive real multiplicative group). The set of such homo- 
morphisms will be called the set of regular characters, denoted by ^(n) 
because n is the direct sum of the 1 dimensional eigenspaces having basis 
Ey (/ < 7 ). We write 

n — ® Ua ’ 
ae@(n) 
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where n a is the set of elements X en such that aXa 1 = a a X. We have 
similarly 

'n = ©('n)_ a . 

a 

Note that a is the 0-eigenspace for the conjugation action of A. 

Essentially the same structure holds for SL n ( C) except that the R- 
dimension of the eigenspaces rt a is 2, because n a has basis E a , iE^. The C- 
dimension is 1. 

By an algebra we mean a vector space with a bilinear map into itself, 
called a product. We make g into an algebra by defining the Lie product 
of X, Y e g to be 

[X, Y] = XY - YX. 

It is immediately verified that this product is bilinear but not associative. 
We call g the Lie algebra of G. Let the space of linear maps Jzf(g, g) be 
denoted by End(g), whose elements are called endomorphisms of g. By 
definition the regular representation of g on itself is the map 

g -> End(g) 

which to each leg associates the endomorphism L(X) of g such that 

L{X)(Y) = [X, n 

Note that X i—► L(X) is a linear map (Chapter XI, §6, Exercise 7). 

Exercise. Verify that denoting L(X) by Dx , we have the derivation 
property for all f,Zeg, namely 

D X [Y,Z\ = [D X Y,Z\ + [Y,D X Z\. 

Using only the bracket notation, this looks like 

[X,[Y,Z]] = [{X,Y],Z] + [Y,X,Z]}. 

We use a also to denote the character on a given on a diagonal matrix 
H = diag(h u ■ ■ ■, h„) by 

o-ij(H) = hi - hj. 

This is the additive version of the multiplicative character previously 
considered multiplicatively on A. Then each n a is also the a-eigenspace for 
the additive character a, namely for H e a, we have 


[H,E a \ = a(H)E a , 
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which you can verify at once from the definition of multiplication of 
matrices. 

Polar Decompositions 

We list here more product decompositions in the notation of groups 
and subgroups. 

Let G = SL n [ C). Let U = U(C) be the set of strictly upper triangular 
matrices with components in C. Show that U is a subgroup. Let D be the 
set of diagonal complex matrices with non-zero diagonal elements. Show 
that D is a subgroup. Let K be the set of elements k e SL n (C) such that 
t k = k~ l . Then K is a subgroup, the complex unitary group. Cf. Chapter 
VII, §3, Exercise 4. 

Verify that the proof of the Iwasawa decomposition works in the 
complex case, that is G = UAK , with the same A in the real and complex 
cases. 

The quadratic map. Let g e G. Define g* = l g. Show that 

{gigiY = g$g\- 

An element g e G is hermitian if and only if g — g*. Cf. Chapter VII, 
§2. Then gg* is hermitian positive definite, i.e. for every v e C”, we have 
< gg*v,v > ^ 0, and = 0 only if v = 0. 

We denote by SPos n (C) the set of all hermitian positive definite n x n 
matrices with determinant 1. 

Theorem 4. Let p e SPos„(C). Then p has a unique square root in 
SPos„(C). 

Proof. See Chapter VIII, §5, Exercise 1. 

Let H be a subgroup of G. By a (left) coset of //, we mean a subset of 
G of the form gH with some g e G. You can easily verify that two cosets 
are either equal or they are disjoint. By G/H we mean the set of cosets of 
//in G. 

Theorem 5. The quadratic map g ^ gg* induces a bijection 

G/K —> SPos„(C). 

Proof. Exercise. Show injectivity and surjectivity separately. 

Theorem 6. The group G has the decomposition [non-unique] 

G = KAK. 


If g e G is written as a product g = k\bki with k\,kie K and b e A, then 
b is uniquely determined up to a permutation of the diagonal elements . 
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Proof. Given g e G there exists k\ e K and b e A such that 

gg* = k x b 2 k\ x 

by using Chapter VIII, Theorem 4.4. By the bijection of Theorem 5, there 
exists kieK such that g = k\bki, which proves the existence of the de¬ 
composition. As to the uniqueness, note that b 2 is the diagonal matrix of 
eigenvalues of gg *, i.e. the diagonal elements are the roots of the charac¬ 
teristic polynomial, and these roots are uniquely determined up to a per¬ 
mutation, thus proving the theorem. 

Note that there is another version of the polar decomposition as 
follows. 

Theorem 7. Abbreviate SPos„(C) = P. Then G — P K, and the decom¬ 
position of an element g — pk with p e P, k e K is unique. 

Proof The existence is a rephrasing of Chapter VIII, §5, Exercise 4. As 
to uniqueness, suppose g = pk. The quadratic map gives gg * = pp* = p 2 . 
The uniqueness of the square root in Theorem 4 shows that p is uniquely 
determined by g , whence so is k , as was to be shown. 
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